ja.IJr 


SnKB 


LIBRARY  OF  THE 

UNIVERSITY  OF  ILLINOIS 

AT  URBANA-CHAMPAIGN 

510.84 

y\o.  818  -  825 
cop.  2, 


The  person  charging  this  material  is  re- 
sponsible for  its  return  to  the  library  from 
which  it  was  withdrawn  on  or  before  the 
Latest  Date  stamped  below. 

Theft,  mutilation,  and  underlining  of  books 
are  reasons  for  disciplinary  action  and  may 
result  in  dismissal  from  the  University. 

UNIVERSITY    OF     ILLINOIS     LIBRARY    AT    URBANA-CHAMPAIGN 


8E  P  1  <?  19W 

SEP  2  1  1 


L161  —  O-1096 


V 


uiucDCS-R-76-818 


HEURISTICS  THAT  DYNAMICALLY  ALTER  DATA  STRUCTURES  TO 
DECREASE  THEIR  ACCESS  TIME 

by 

James  Richard  Bitner 


July,  I976 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/heuristicsthatdy818bitn 


UIUCDCS-R-76-818 


HEURISTICS  THAT  DYNAMICALLY  ALTER  DATA  STRUCTURES  TO 
DECREASE  THEIR  ACCESS  TIME* 


by 


James  Richard  Bitner 


July  1976 

Department  of  Computer  Science 
University  of  Illinois  at  Urbana -Champaign 
Urbana,  Illinois  6l801 


*This  work  was  supported  in  part  by  the  Department  of  Computer  Science 
and  in  part  by  the  National  Science  Foundation  under  Grant  GJ-I4I538 
and  was  submitted  in  partial  fulfillment  of  the  requirements  for  the 
degree  of  Doctor  of  Philosophy  in  Computer  Science,  1976. 


m 

■ 

ACKNOWLEDGMENTS 


I  am  very   grateful  to  my  advisor,  Edward  M.  Reingold,  for  his 
many  valuable  suggestions  during  the  preparation  of  this  thesis,  and  to 
the  other  members  of  my  exam  committee:  C.  W.  Gear,  David  Kuck,  David 
Muller,  and  C.  L.  Liu.  I  thank  D.  L.  Burkholder  for  his  aid  in  proving 
the  lemma  on  Page  109.  I  also  wish  to  thank  the  typists  at  Techni- 
Typist  for  their  fine  job  of  typing  this  thesis  and  the  National  Science 
Foundation  (Grants  GJ-31222  and  GJ-41538)  and  the  Department  of  Computer 
Science  for  their  financial  support.  Finally,  I  want  to  thank  my  family 
and  friends  for  their  understanding  moral  support. 


IV 


TABLE  OF  CONTENTS 

Page 

1.  INTRODUCTION  1 

2.  LINKED  LISTS  .   . 5 

2.1  Asymptotic  Results  for  Permutation  Rules  6 

2.2  Rate  of  Convergence 23 

2.3  Other  Permutation  Rules 40 

2.4  A  Hybrid  Rule 41 

2.5  The  First  Request  Rule  .   . 46 

2.6  Frequency  Count  Rule 49 

2.7  Limited  Difference  Rules  ...  . 54 

2.8  Wait  c,  Move  and  Clear  Rules 57 

2.9  Wait  c  and  Move  Rules 65 

2.10  Time  Varying  Distributions 74 

2.11  Summary  and  Conclusion 77 

3.  BINARY  SEARCH  TREES 82 

3.1  Transform  after  Every  Request 84 

3.2  Monotonic  Trees 98 

3.3.  Cost  Balanced  Trees 113 

3.4.  Double  Rotations 119 

3.5.  Summary  and  Conclusion 123 

4.  CONCLUSION 126 

APPENDIX 128 

REFERENCES 133 

VITA 135 


1.  INTRODUCTION 

Users  of  data  structures  frequently  ignore  some  very   valuable 
information:  the  number  of  times  each  key  is  requested.  It  is  rarely 
the  case  that  all  keys  are  equally  likely  to  be  requested;  some  keys 
will  be  accessed  frequently  and  others  only  rarely.  Because  we  search 
for  a  key  in  a  data  structure  by  examining  the  locations  in  a  certain 
order  (which  may  depend  on  the  results  of  previous  key  comparisons),  the 
position  that  a  given  key  occupies  is  important,  and  data  will  be  re- 
trieved much  faster  if  high  probability  keys  occur  in  the  positions  of 
the  data  structure  that  are  searched  first,  that  is,  near  the  "top"  of 
the  structure. 

There  are  various  results  in  the  case  where  the  key  request 
probabilities  are  known  a  priori,  but  this  is  seldom  the  case.  If  the 
probability  distribution  is  not  known  beforehand,  it  must  be  observed  as 
requests  for  keys  are  made.  To  take  advantage  of  this  information,  the 
structure  must  be  dynamically  altered  so  that  high  probability  elements 
rise  to  the  "top"  of  the  structure,  and  low  probability  elements  sink  to 
the  "bottom."  The  purpose  of  this  thesis  is  to  evaluate  and  compare 
simple  heuristics  that  perform  this  alteration. 

Throughout  this  thesis,  the  expected  access  time  will  be 
used  as  the  evaluation  criterion  for  these  heuristics.  If  c.  key  com- 
parisons are  required  to  locate  key  k. ,  which  is  requested  with  probabil- 

n 
ity  p.,  then  the  expected  access  time  is  defined  by  I   p.  c.  This  is 

1  i=l  n  ] 

a  good  measure  of  the  cost  of  accessing  elements  in  the  data  structure 


since  it  is  equal  to  the  two  major  costs,  the  expected  number  of 
comparisons  and  the  expected  number  of  links  we  must  traverse  (in  a 
list  or  tree) . 

We  will  not  consider  the  cost  of  performing  the  dynamic  altera- 
tion because  the  rules  we  will  look  at  are   simple,  and  this  cost  should 
be  quite  small.  Occasionally  it  is  not,  and  these  instances  will  be 
noted. 

Depending  on  the  exact  probability  distribution  of  key  requests, 
substantial  savings  can  be  achieved  if  the  arrangement  of  keys  in  the 
data  structure  is  favorable.  As  an  example,  let  us  consider  a  linked 
list.  If  the  order  of  the  keys  is  random  (each  of  the  n!  orderings  is 
equally  likely),  the  expected  access  time  is  -*-  since,  on  the  average, 
we  must  search  half-way  down  the  list  to  find  a  given  key.  Note  that 
this  result  holds  for  any  probability  distribution  of  key  requests. 

The  optimal  arrangement  occurs  when  the  elements  of  the  list 
are  in  order  of  decreasing  probability.  The  proof  is  simple:  in  any 
other  ordering  there  must  be  a  key  that  occurs  before  another  keys 
which  has  higher  probability.  Interchanging  these  two  keys  results  in 
a  list  of  lower  cost,  and  hence  the  original  arrangement  cannot  be 
optimal . 

An  interesting  generalization  is  given  by  Knuth  [3,  p.  400]. 
He  supposes  that  the  records  are  stored  on  tape  and  that  the  i   has 
probability  p.  and  length  L. .  It  can  then  be  proved  that  arranging  the 
records  in  decreasing  order  of  P..-/I-.  will  yield  the  optimal  arrangement. 


If  the  keys  are  optimally  arranged,  substantial  decreases  in 
access  time  can  result  as  shown  in  Table  1. 

Table  1 
Comparison  of  Optimal  and  "Random"  Costs 


Distribution 


p,  =  1 ,  p.  =  0  for  i  >  1 


P1  =  ri(^T),r<l 

r-r 
(Geometric  Distribution) 

n 


Minimum  Cost 

Cost  of  Random 
Arrangement 

1 

1     nrn+1  _  1 
1-r   r_rn+1   ]"r 

n+1 
2 

n+1 
2 

p.  =  _!_  where  Hn  -    J    \  '■    In  n 

n 
(Zipf's  Law) 


n  -  n  n+1 


n      "   k=l  "  Hn   ln  n 


Distribution  of  English 

Letters,  see  Kahn  [5,  p.  100]         7.5375  13.5 


Distribution  of  the  50  most 
probable  English  words,  see 
Kucera  and  Francis  [6]  12.5718  25.5 


The  first  distribution  is  rather  unlikely,  but  points  out  that  great 
decreases  can  occur.  The  minimum  cost  for  the  geometric  distribution 
quickly  approaches  j-^   ,  a  constant,  again,  much  smaller  than  -^-k  If 
the  distribution  is  in  accordance  with  Zipf's  Law,  the  random  cost  is 


■nr-2-  times  greater  than  the  optimum.  This  can  mean  a  factor  of  four  or 

five  for  reasonably  sized  n.  Both  these  formulas  are  easily  derived  by 

n 
substituting  thep.  into  ][  ip.(the  optimal  cost).  For  both 

English  letters  and  the  fifty  most  frequent  English  words,  the  random 

cost  is  approximately  twice  the  optimum. 

The  costs  of  the  heuristics  described  in  the  following  section 
are  shown  to  be  at  most  twice  the  optimal  cost.  The  exact  cost,  cal- 
culated for  several  distributions,  is  well  under  this  bound.  In  fact, 
it  is  at  most  38  percent  larger  than  the  optimum  for  these  distributions, 

Throughout  this  thesis,  we  will  use  In  x  to  denote  the  natural 

logarithm  of  x  and  log  x  to  denote  the  base  2  log.  Also,  the  following 

standard  notations  are  used:  f(n)  =  0(g(n))  means  lim  f (n)  is 

n-*»  g(n) 
bounded,  f(n)  =  o(g(n))  means  this  limit  equals  zero,  and  f(n)  = 

fi(g(n))  means  this  limit  is  bounded,  but  not  equal  to  zero. 


2.  LINKED  LISTS 

The  first  data  structure  we  will  study  is  the  linked  list. 
Throughout  most  of  this  chapter,  we  make  the  assumption  that  the  request 
probabilities  are  constant  with  time  and  that  the  requests  are  in- 
dependent. Because  of  these  assumptions,  we  can  model  the  behavior  of 
an  n-element  list  by  a  Markov  chain*  with  n!  states.  Each  state  cor- 
responds to  one  of  the  different  orderings  of  the  list.  This  model 
allows  us  to  analyze  the  performance  of  the  various  rules. 

This  analysis  will  be  from  two  different  points  of  view. 
The  first  supposes  that  the  number  of  key  requests  will  be  large 
compared  to  the  number  of  elements  in  the  list.  In  this  case  we  are 
only  concerned  with  the  steady  state  of  the  Markov  chain  which  tells  us 
the  asymptotic  behavior. 

The  second  point  of  view  supposes  there  will  be  relatively 
few  key  requests.  In  this  case,  it  is  important  how  quickly  the 
Markov  chain  approaches  steady  state  from  the  initial  distribution  (in 
which  each  state  is  assumed  to  be  equally  likely).  In  general,  the 
rate  of  convergence  also  indicates  how  well  a  rule  will  perform  if  the 
distribution  varies  with  time.  The  more  rapid  the  convergence,  the 
more  quickly  a  rule  can  adapt  to  a  changing  distribution. 


*See  the  appendix  for  a  summary  of  the  important  properties  of 
Markov  chains. 


2.1  Asymptotic  Results  for  Permutation  Rules 

We  define  a  permutation  rule  as  a  set  of  n  permutations 
{Tji  1  <  i  <  n}  of  the  integers  {!,..., n}.  When  the  key  in  position  i 
is  requested,  t.  is  used  to  reorder  the  elements  of  the  list. 

We  will  be  primarily  concerned  with  the  following  two 
permutation  rules:  The  move  to  front  rule  moves  the  requested  key  to 
the  top  of  the  list,  and  the  transposition  rule  transposes  the  requested 
key  with  the  one  above  it.  In  both  cases,  if  the  requested  key  is 
already  at  the  top  of  the  list,  no  action  is  performed.  We  consider 
these  rules  because  they  are  simple,  and  because  the  changes  required 
on  a  linked  list  are  cheaply  executed.  (If  the  list  is  sequentially 
allocated  instead  of  linked,  the  move  to  front  rule  becomes  very  ex- 
pensive to  execute. ) 

Previous  work  in  this  area  has  been  done  [2  and  7]  where  the 
cost  of  the  move  to  front  rule  is  shown  to  be  at  most  twice  the 
optimal  cost.  Rivest  [2]  determined  the  steady  state  distribution  of 
the  transposition  rule,  and  proved  that  it  has  lower  asymptotic  cost 
than  the  move  to  front  rule.  He  also  conjectured  the  transposition 
rule  to  be  the  optimal  rule  of  all  permutation  rules.  Yao  [4]  proved 
that  the  transposition  rule  is  optimal  assuming  an  optimal  rule  exists. 
The  cost  of  the  move  to  front  rule  has  been  determined  [3,7,10]  and 
analyzed  by  Knuth  [3]  in  the  case  of  Zipf's  Law.  Finally,  Hendricks 
[8]  has  determined  the  steady  state  distribution  for  the  move  to  front 
rule.  These  results  will  be  proved  in  this  section. 


As  noted  in  the  appendix,  both  heuristics  have  steady  state 
distributions  and  approach  them  from  any  initial  distribution,  and  the 
asymptotic  access  time  is  the  expected  access  time  for  the  steady  state 
distribution.  We  begin  the  analysis  by  determining  the  steady  state 
distribution  for  the  move  to  the  front  rule. 

Theorem  (Hendricks  [8]):  Consider  any  arrangement  of  n  keys  and  label 
them  k,  ,...,k  with  probabilities  p,  ,...,p  respectively.  Using  the  move 
to  the  front  rule,  the  steady  state  probability  of  this  arrangement  is 

n 
n  p. 

P(kl ys^ ' 

n  I     Pi 
i=l  j=i+l  J 

Proof:  For  the  list  to  be  in  this  ordering,  k.  must  have  been  requested 
more  recently  than  k-+, ,  k.+2,...k  ,  for  1  <  i  <  n-1.  The  probability 

that  k, ,  was  requested  after  every  other  key  is  p-, .  The  probability  that 

p2 

k,  was  requested  last  out  of  kos  k0,...,  k  is  - — t-= — t tz— .  In 

2      *•  25  3'     n    p2  +  p3  +---+Pn 

general,  the  probability  that  k.  was  requested  last  out  of  k. ,  k.+1,..., 

p . 
k  is   +  1  .     Multiplying  these  probabilities  gives  the  probability 

pi  "  *  pn 
of  the  required  sequence  of  key  requests,  which  is 

n  n 

n  P        n  P 

j=i  '      =   i-i  1  rn 

n   n       n-1   n  I I 

n  I    p,.    n  I     p. 
i=2  j=i  J    i=l  j=i+l  J 


H 


A  more  interesting  statistic  then  the  steady  state  distribution 
is  the  asymptotic  access  time  (or  "cost").  This  1s  determined  1n  the 
following  theorem. 

Theorem  (Knuth  [3,  p.  399],  Burville  and  Kingman  [7]  and  McCabe  [10]): 
Given  keys  k, ,  k,,,...,k  having  probabilities  p, ,  p2,...,p  ,  the 
asymptotic  cost  for  a  list  ordered  by  the  move  to  front  rule  is 

y    pipi 
l<i<j<n  pi  +  p.* 

Proof:  If  we  let  I.   be  a  random  variable  denoting  the  location  of  k., 

E(Cost)  =  E(  I     p.  JL.)  =  I     E(p.i   ) 
i=l  n  n    i=l    n  l 

Since  the  expectation  of  a  sum  equals  the  sum  of  the  expectations,  and 

i=l  1    1 

since  each  p.  is  a  constant.  To  determine  £(l. ),   define  for  j7i 

random  variables 

fl  rl   if  k.  is  ahead  of  k.  in  the  list 

Jl        0  if  not 

Since  a  given  key's  position  is  just  one  more  than  the  number  of  keys 

ahead  of  it, 

I*  =  1  +    I     A.,  and 
J7i     J1 


E(£J   =  1+1      E(A..) 

1         m     J1 


But  A..  =  1  •  Prob  (k.  ahead  of  k.)  +  0  •  Prob  (k.  not  ahead  of  k.) 

=  Prob  (k.  ahead  of  k. ). 
Therefore 

E(l.)   =  1  +  J    Prob  (k,.  ahead  of  k.) 
and 


tfi 


E(Cost)  =     l    p,   0  +    I     Prob  (k.  ahead  of  k.) 
i=l     1  j?1  J  n 

n 

E(Cost)  =  1  +  I  p.  I     Prob  (k,  ahead  of  k. ) . 
i=l  1  j7i       J         n 

This  last  relation  is  very   important  and  much  use  will  be  made  of  it. 

Prob  (k.  is  ahead  of  k. )  is  just  the  probability  k.  was 
^  p.  ^ 

requested  after  k.  and  therefore  equals   j-  .  Substituting  this  into 

the  cost  formula  gives 

1  +  y  y  pipj  =i+2  I       Pl'PJ 
•  i  •£•  d.+d.        l<i<i<n  p.+p. 
i=l  J^i  Hi  Hj         "  J_  pi  Hj 

Table  2.1.1  gives  an  indication  of  the  magnitude  of  this  cost. 
We  can  see  from  this  table  that  the  move  to  the  front  rule  compares  quite 
favorably  with  the  optimum.  The  increase  for  the  geometric  distribution 
appears  to  reach  a  limit  of  26.4%.  Although  the  increase  for  Zipf's  Law 
and  large  n  cannot  be  seen  from  this  table,  Knuth  [3,  p.  399]  has 
shown  it  to  be  approximately  38  percent  for  large  n.  The  other  two 
distributions  considered  have  increases  of  27.8  percent  and  32.6  percent, 
which  are  also  in  the  same  range. 

The  steady  state  distribution  of  the  transposition  rule  can 
also  be  determined. 
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Table  2.1.1 

Asymptotic  Cost  of  the  Move  to  the  Front  Rule 
Compared  with  the  Optimal  Cost 


ENGLISH    LETTtHI 

OPTIMAL 

MOVE    TO 

FRONT     RULE                 X 

INCREASE 

7 

.5375 

9 

•  6359 

27.8 

ENGLISH    WORDS 

OPT  I MAL 

MOVE    TO 

FRONT    RULE                 X 

INCREASE 

12 

•  5718 

16 

.6667 

32.6 

GEOMETRIC    DISTRIBUTION     MITH    N    ELEMENTS 

N 

OPTIMAL 

MOVE 

TO    FRONT     RULE 

X     INCREASE 

3 

1.5714 

1.8000 

14.5 

4 

1.7333 

2.0607 

18.9 

5 

1.8387 

2.2392 

21.8 

6 

1.9048 

2.3550 

23.6 

7 

1.9449 

2.4270 

24.8 

6 

1  • 96  86 

2.4704 

25.5 

9 

1.9624 

2.4958 

25.9 

10 

1 .9902 

2.5105 

26.1 

11 

1.9946 

2.5188 

26.3 

12 

1.9971 

2.5234 

26.4 

13 

1.9984 

2.5260 

26.4 

14 

1.9991 

2.5274 

26.4 

15 

1.9995 

2.5281 

26.4 

16 

1.9998 

2.5285 

26.4 

17 

1.9999 

2.5287 

26.4 

18 

1.9999 

2.5289 

26.4 

19 

2.0000 

2. 5289 

26.4 

20 

2.0000 

2.5290 

26.4 

ZIPF'S    LAW    1 

tflTH 

N    ELEMENTS 

N 

OPTIMAL 

MOVE 

TO    FRONT     RULE 

X     INCREASE 

3 

1.6364 

1.8545 

13.3 

4 

1.92  00 

2.241  1 

16.7 

5 

2. 1898 

2.6104 

19.2 

6 

2.4490 

2.9660 

21.1 

7 

2.6997 

3.3107 

22.6 

8 

2.9435 

3.6462 

23.9 

9 

3.1814 

3.9739 

24.9 

10 

3.4142 

4.2949 

25.8 

11 

3.6425 

4.6100 

26.6 

12 

3.8670 

4.9198 

27.2 

13 

4.0879 

5.2248 

27.8 

14 

4.3056 

5.5256 

28.3 

15 

4.52  05 

5.8225 

28.8 

16 

4.7327 

6. 1158 

29.2 

17 

4.9425 

6.4058 

29.6 

18 

5.1501 

6.6928 

30.0 

19 

5.3555 

6.9770 

30.3 

20 

5.5590 

7,2585 

30.6 

11 


Theorem  (Rivest  [2]):  Consider  any  arrangement  of  n  keys  and  label 
them  k,  ,...,k  with  probabilities  p,  ,...,p  respectively.  Using  the 
transposition  rule,  the  steady  state  probability  of  this  arrangement 

iS  n   „  , 

n  p?"1 

1=1   ! 


where  n 

r  _  I        n  p""1    where  n  =  (II,  ,IU,.,., JL) 

1  ~   all  n  i=l  ni  '  c  n 

is  a  permutation  of  {l,...,n}. 

Proof:  We  can  easily  verify  that  this  is  indeed  a  probability  distribu- 
tion since  all  terms  are  nonnegative  and  must  sum  to  1  (by  definition  of 

C). 

To  show  it  is  the  stationary  distribution,  we  show  P(k, ,..,,k  ) 
satisfies  the  steady  state  equation: 

P(kr...,kn)  =  P]  P(kr...,kn) 

n-1 
*&   Pi  P(krk2,...,ki+1,k.,...,kn}. 

From  the  definition  of  P  we  can  see  that 

p<kl ki+rki V  =^f  •  p<kl kn' 


(1) 


12 


Substituting  this  in  (1)  gives 

3/t"! ki'k1+l kn)l7-Pt  +  «kT-'kB'Pl" 


n-1 
P(kn kn)(  J    p1+l  ♦  p,)  ■  P(kr.....kn) 


D 


Hence  the  distribution  is  the  steady  state  distribution. 

From  this  distribution,  we  can  determine  the  steady  state  cost 
by  multiplying  the  cost  of  each  state  by  its  probability  and  then  summing 
over  all  states.  We  have  done  this  for  Zipf's  distribution  and  the 
results  are  summarized  in  Table  2.1.2.  It  is  interesting  to  note  that 
the  difference  from  the  optimum  decreases  as  n  increases  (the  difference 
increases  for  the  move  to  the  front  rule).  Also,  the  percentage  in- 
crease is  noticeably  smaller  than  the  38  percent  of  the  move  to  the 
front  rule.  In  fact,  Rivest  [2]  has  shown  this  must  hold  for  any 
distribution. 

Table  2.1.2 

Asymptotic  Cost  of  the  Transposition  Rule 
Compared  with  the  Optimum 


ASYMPTOTIC    COST    FOR    ZIPF'S    LAM    WITH    N    ELCMCNTS 

N  OPTIMAL  TRANSPOSITION    RULE  ktNCNEASE 

3  1.6364  1.6181  11.1 

4  1.9200  2.1392  11.4 

5  2.1896  2.4304  11.0 

6  2.4490  2.7042  10.4 

7  2.6997  2.9662  9.9 
6  2.9435  3*2191  9.4 
9  3.1814  3.4648  8.9 
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Theorem  (Rivest  [2]):  For  any  probability  distribution,  the  cost  of  the 
transposition  rule  is  less  than  or  equal  to  that  of  the  move  to  front 
rule. 

Proof:  Consider  Prob(k.  is  ahead  of  k.)  for  the  transposition  rule. 

This  is  merely  the  sum  of  the  probabilities  of  all  states  where  k^  is 

pVz 
ahead  of  k..  These  states  have  probabilities  of  the  form  ^ J 

(assume  p.  >  p.),  where  x  >  y  and  z  is  a  product  of  powers  of  the  other 

p!s.     We  can  pair  each  pXp^z  in  the  numerator  with  two  terms  (p.p^z  and 

K  1  J  1  J 

pWz)  in  the  denominator. 

pMz         px~y 
Since  — J =  ,  we  qet 

Pxpyz  +  pyPxz      Px-y  +  Px-y 

PX"y 
„X  V    /   M      x,  x  v,   „y„x,x 

PiPJ  =(D*-y  +  Dx-y)(pipjz  +  pipJz) 

Hi    pj 

p. 
Since  1  g   x-y,    j    (pipjz  +  pipiz)  =  pipjz*  Summin9  over 

all  states  with  x  >  y  and  dividing  by  C  gives 
P.( 

J 


pi 
- — r-=--<  Prob(k.  ahead  of  k.) 


Since  - — -    =  Prob(k.  ahead  of  k.)  using  the  move  to  the 
pi   pj  J 

n 
front  rule,  and  E(Cost)  =  1  +  £  p.  J  Prob(k.  ahead  of  k.), 

i=l  n  1«      1         J 

the  transposition  rule  is  better  than  (or  the  same  as)  the  move  to 

the  front  rule.  I  I 
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So  the  transposition  rule  has  lower  asymptotic  cost  than 
the  move  to  front  rule.  Rivest  [2]  has  conjectured  that  this  result 
extends  to  all  permutation  rules,  i.e.  that  the  transposition  rule  is  the 
optimal  rule  (has  lowest  asymptotic  cost  for  any  probability  distribu- 
tion) out  of  all  permutation  rules. 

Intuitively,  this  conjecture  is  not  surprising.  The  best  we 
could  possibly  do  (see  Section  2.6)  is  to  count  the  number  of  times 
each  key  has  been  accessed,  and  keep  the  keys  ordered  with  respect  to 
this  count.  The  rule  which  most  closely  approximates  this  strategy  is 
the  transposition  rule.  We  can  also  look  at  the  situation  in  a  dif- 
ferent way:  After  a  long  time,  the  high  probability  keys  are  near  the 
front  of  the  list,  and  the  low  probability  keys  near  the  bottom. 
Occasionally,  a  low  probability  key  will  be  accessed,  and  the  move  to 
the  front  rule  will  move  it  to  the  front  of  the  list,  increasing  the 
expected  cost  since  many  high  probability  keys  have  moved  down  one 
position.  The  transposition  rule  does  not  do  this,  and  it  is  difficult 
for  the  low  probability  keys  to  rise  to  high  positions  in  the  list. 

While  we  cannot  yet  prove  the  transposition  rule  is  optimal,  it 
has  been  shown  by  Yao  [4]  that  if  an  optimal  permutation  rule  (optimal 
for  all  distributions)  does  exist,  it  must  be  the  transposition  rule. 
He  does  this  by  showing  a  particular  distribution  for  which  the  trans- 
position rule  is  optimal.  Before  discussing  Yao's  proof,  we  need  a 
theorem  by  Rivest  [2]. 
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Theorem  (Rivest  [2]):  An  optimal  permutation  rule,  {t.,  1  <  j  <  n} 

J 
i,L 

(t.  is  used  when  the  j   key  is  requested)  must  have  the  property  that 

J 

each  t.  : 
J 

(i)  leaves  positions  j+1  to  n  of  the  list  fixed 
(ii)  if  j  >  1,  moves  the  key  in  position  j  to  some 
position  j'  <  j. 

Proof:  Consider  the  probability  distribution  p.  =  1/k  for  1  <  i  <  k  and 
p.  =  0  for  k  <  i  <  n,  for  some  k  <  n.  Any  permutation  rule  satisfying 
(i)  and  (ii)  above  will  have  an  asymptotic  cost  of  (k+l)/2  since  all  of 
the  keys  with  zero  probability  will  move  to  the  end  of  the  list  and 
stay  there.  Any  permutation  rule  which  violates  (i)  will  occasionally 
move  a  key  with  zero  probability  in  front  of  one  with  nonzero  prob- 
ability, and  thus  have  greater  asymptotic  cost.  Any  permutation  rule 
which  satisfies  (i)  but  not  (ii)  will  not  be  able  to  move  any  keys  out 
of  positions  j  such  that  t.(j')  =  j,  so  that  the  optimal  ordering  for 
this  particular  probability  distribution  cannot  be  reached,  and,  again, 
the  asymptotic  cost  will  be  higher. 

Theorem  (Yao  [4]):  Given  a  list  of  n  elements  with  probability  distri- 

1-e 
bution  p,  =  1-e  and  p.  =  — y,  2  <  i  <  n,  there  is  an  e  small  enough 

such  that  the  transposition  rule  is  optimal  for  this  distribution. 

Proof:  The  Markov  chain  corresponding  to  this  list  has  n  distinct 
states,  each  one  having  k,  (the  key  with  probability  1-e)  in  a  different 
position.  Let  q.  be  the  steady  state  probability  that  k-j  occupies 
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position  i,  using  the  transposition  rule,  and  let  r.  be  this  probability 
using  the  optimal  rule.  The  transposition  steady  state  satisfies: 


ql   =   (1"n^  ql  +  (1"e)  q2 


q2  =        H^T    ql  +^f  E  q2  +  (1"e)  q: 


'n-1   =  iPTqn-2  +  ^f  e  Vl   +  (1"e)  qn 


qn  =  TPT      qn-l  +  e  qn 

We  solve  this  to  get: 


Vl  ■  (pTf   T^  qi     for    J  =  l....»n-l 

n 
Since   Y  q.  =  1,  we  obtain 
1=1  ? 

q]  =  1  +  0(e),  qj  =  (^rr)^1  +  0(ej)  2  <  j  <  n. 

From  Ri vest's  Theorem,  we  know  an  optimal  rule  (if  one  exists)  must 
have  the  form: 


17 


Tl 


T2 


T3 


1  2  ...  n\ 

1  2  ...  nj 

1  2  3  ...  n 

2  1  3  ...  n. 


1    2    3    4  ...  n1 
ld31  a32  a33  4  •••  n 


Vl 


Tn 


1      2      ...  n-1     n 


an-l,l  an-l,2  •••  an-l,n-l  ' 


1    2    ...  n-1    n 


anl  an2  •••  an-l,n  ann 


The  theorem  now  proceeds  inductively  in  n-2  stages.  At  the 


1L 

k   stage  we  will  show: 

1.  t.+2  is  the  same  as  the  transposition  rule. 

2.  a..  =  k  for  i  >  k  +  2. 

J'  rk+l   n-1  1-e  rk   ^n-lj    uu   j* 

Note  that  after  stage  n-2,  we  will  have  proven  x.  is  the  same  as  the 
transposition  rule  for  i=l,...,n,  and  hence  the  theorem  will  be  proved. 

To  begin  the  induction,  we  note  that  x-,  and  x2  are  the  same 
as  the  transposition  rule  by  Rivest's  Theorem.  Hence  condition  1  is 
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initially  satisfied.     Note  that  condition  2  vacuously  holds  1f  k=0. 
Finally,  any  rule  satisfying  the  condition  of  Ri vest's  Theorem  has 

oo 

r-.=l  when  c=0.  Since  r,  =  £  a-jei'  ^  c=0>  ri  =  ^=d0"  Hence  ri  = 
l+0(e)  and  condition  3  is  initially  satisfied.  The  proof  for  stage  k 
proceeds  as  follows: 

Let  N(i,j)  be  the  number  of  I   such  that  an,-=j-  This  is  just 
the  number  of  requests  that  cause  the  key  in  position  i  to  move  to 
position  j.  The  r.  must  satisfy  their  steady  state  equations.  These 
give  us  the  following  bound 

ri  £  N^'1)  *  7jfr  rk    k  +  1  <  i  <  n 

Here  we  have  counted  only  those  transitions  from  state  k  to  state  i 
and  replaced  the  transition  probability  by  a  lower  bound  of  -§y. 
Summing  these  inequalities  gives 

n 

i 

i=k+l 


rwt-tr»=>LL"lki,|]iPTrk' 


n 
Set  a  =  I      N(k,i).  Note  that  a  is  just  the  number  of  j's  such  that 
i=k+l 


ajk  >  k' 


>  a(^-)k+1  +  0(ek+2)  by  Condition  3 


So 

Vl  +--'+  rn  =  A(n^T)k+1  +  °(£k+2)  for  some  A  "  a  (1) 

From  this  we  conclude 
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E  xk+1       n/   k+2< 


(k+l)rk+1  +  ...+nrnnk+l)A(^)K+l+0(en  (2) 

r}  +  ...  +  rk  =  1   -  A^)**1  +  0(ek+2).  (3) 


We  know  that 


since 


(k+l)qk+1  +   ...   +  n  qn  =   (k+l)^)^1   +  0(ek+2)  (4) 


qk+1  =   ljtf)M   and  ^   <  0(ek+1)  for  i   >  k  +  1 
Subtracting  (4)  from  (2)  gives 

(k+l)rk+1  +  ...  +nrn-  (k+l)qk+1   ...  -  n.  qR  > 

(k+l)(A-l)(^§T)k+1  +  0(ek+2)  (5) 

Since 

'l  *  —  +  t>k=1  "Vl  "  •••  -qn 

we  have 

^  +  ...  +qk=  1  -  (^rr)^1  +0(ek+2)  (6) 

1       e  i  - 1 

Now  let  c  =  — y  y-r  •     We  have  q.  =  c        q.,  i  <  k  and  from  property  3, 

r.  =  c1"1  r,       1  <  k  (7) 

Hence  q,   +  q2  +  . . .   +  qk  =  q]  (l+c+c2+. .  .+ck_1 )   =  1   -   (^)k+1   +  0(ek+2) 

1   -   (^)k+1  +  0(ek+2) 

q1   =  Qd-1 CT  (8) 

1       1  +  c  +  zL  +  ...   +  cK   ' 


Similarly 


2     k-l 
r,  +  ...  +  rk  =  r^l+c+c  +..  .+c   )  using  (3), 

=  l  -  ACri§T)k+1  +  °(£k+2)' 
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so 


rl  = 


1   "  A(^)k+1  *  0(ck+2) 


1  +  c  +  c    +  ...  +  c 


FT 


(9) 


Now 


k-l 
q,   +  2q2  +   ...   +  kqu  =  q.,   +  2cq1   +   . . .   +  kc         q 


■k       ^1 


=  q1(l+2c+...kck_1) 
Substituting  for  q,   from  (8) 


i  -  (H§r)k+1  +  o(^k+2) 


k-l 


l+2c+...+kc 
l+c+...+ck" 


Similarly 


k-l 


r-,   +  2r9  +   . . .   +  kqu  =  qn   +  2cq,   +   . . .   +  kc         q 

l+2c+...+kc 


k-l 


H  -  A(n§T>k+1  +  °(£k+2^ 


Subtracting  (11)  from  (10)  gives 


1+C+...+C 


ET 


(10) 


(11) 


q]   +  2q2  +   ...   +  kqk  -  r]   -  2r2  -   ...   -  krk  = 
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[(A-lMpiy)1*1  +  0(ck+2)] 


<  C(A-D(T5§T)k+1  +  0(ek+2)] 


l+2c+...+kc 


k-1 


1+C+...+C 


k+kc+...+kc 


k-T 


k-1 


1+C+...+C 


R-T 


<  WA-Dt^y)^1  +  0(ek+2) 
Finally,  subtracting  (12)  from  (5)  gives 


(12) 


r,  +  2r9  +..-.+  nr-  q,  -  2q 


n   Hl 


..  -  nq, 


<  (A-l)(^)k+1  +  0(ek+2) 


(13) 


But  the  left  hand  side  of  (13)  is  just  the  cost  of  the  optimal  rule 
minus  the  cost  of  the  transposition  rule.  If  A  >  1,  then  the  trans- 
position has  lower  cost  than  the  optimum  rule,  a  contradiction.  Hence 
A  <  1  and  therefore  a  <  1.  Recall  that  a  is  the  number  of  a.,  f   k. 
Since  t,+,  is  the  same  as  the  transposition  rule  by  Condition  1,  we 
have  a,+,  .  =  k+1  and  hence  a  >  1.  Hence  a  must  equal  1.  Thus  all 
other  a..  <  k  (else,  a  >  1).  If  a..  <  k,  Condition  2  will  be  violated 
since  this  value  has  already  appeared  in  the  permutation.  Hence  a..  =  k: 
proving  Condition  2  for  k.  This  determines  the  equation  for  r.  ,. 
If  k=l, 

rl  "  <]  -  7PT>'1  +  n-e)r2 


V  (1-eHii-l)  r1  =  CT+0^ 
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since  r^  =  l+0(e).     If  k  >  1 


rk-RVl  +^frk  +  0-«'  Vi 


Solving  this  for  rk+1  and  substituting  (iDlliLlSl)  ^  for  r^ 
(Condition  3)  gives  rk+1  =  ((n_^(-|_£))rk  and  since  rk  =  (^§y)k  + 


k+1 

0(e   )  (again,  Condition  3) 


r„Al  =  (A)k+1  -  0(ck+2) 


k+1   vn-l 

In  either  case,  Condition  3  is  proved.  All  that  remains  is  to  prove 
Condition  1.  From  Condition  2,  we  know  ak+2  •  =  i  for  i  <  k  -  1.  We 
now  know  ak+2  i.   ■  k,  and  hence  xk+2  is  the  same  as  the  transposition 
rule,  completing  the  induction  for  Condition  1  and  proving  the  theorem. I  I 

A  final  question  is  how  far  these  rules  can  possibly  be  from 
the  optimum.  This  is  answered  for  the  move  to  the  front  rule  by 
Rivest  [2]  and  Burville  and  Kingman  [7],  If  we  assume  p,  >  p2  >  ...  >  p 
are  the  key  request  probabilities,  then 

n  1-1  P,-P 


I  I. 


ILL 


MTF  Cost  .  1+2  i=1  j=l  pi+pj  ^  l+2x 
n 

1=1   n 


Opt  Cost  "  *  1+x 


n  -i  n_-| 

where  x  =  J  p . ( j-1 )  £  2(1  -  — py)  since  x  <  — . 

j=l 
Therefore,  the  move  to  the  front  rule  never  does  more  than  twice  the 
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work  of  the  optimal  ordering.  The  theorem  also  holds  for  the  trans- 
position rule  as  its  cost  is  less  than  or  equal  to  that  of  the  move  to 
front  rule.  This  may  be  a  significant  savings,  as  remarked  in  the 
introduction. 

In  summary,  the  situation  in  the  asymptotic  case  is  quite 
clear:  the  transposition  rule  has  asymptotic  cost  less  than  or  equal 
to  that  of  the  move  to  front  rule.  Both  rules  compare  quite  favorably 
with  the  optimal  cost.  For  the  distributions  we  considered,  the 
transposition  rule  was  within  10  percent  of  the  optimum  and  the  move  to 
front  ranged  from  25  percent  to  38  percent.  Finally,  the  cost  of  these 
rules  is  at  most  twice  the  optimal  cost  for  any  probability  distribution, 

2.2  Rate  of  Convergence 

In  the  previous  section,  we  only  considered  asymptotic 
behavior  and  found  the  move  to  the  front  rule  inferior  to  the  trans- 
position rule.  In  this  section  we  will  consider  how  quickly  the  rules 
approach  their  asymptotes.  We  will  find  that  the  move  to  the  front 
rule  approaches  its  asymptote  more  quickly,  and  initially  has  a  lower 
expected  cost  than  the  transposition  rule. 

The  reason  for  this  is  clear.  In  the  initial  random  ordering, 
many  high  probability  elements  are  far  down  in  the  list.  These  must  be 
brought  to  the  front  to  reduce  the  cost.  Obviously,  the  move  to  the 
front  rule  will  do  a  better  job  here  since  these  keys  make  large  jumps 
and  quickly  rise  to  the  top.  The  transposition  rule  allows  keys  to 
move  only  one  step  at  a  time,  so  the  convergence  should  be  rather  slow. 
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When  key  k.  is  requested,  it  moves  up  one  position,  decreasing  the  cost 
by  p.  since  we  can  locate  k.  with  one  less  compare,  and  increasing  the 
cost  by  p.  -■ ,  since  key  k.  ,  (the  key  above  k.)  moves  down  one  position, 
resulting  in  a  net  decrease  of  p .  -  p  -  _  -. .  If  the  p.'s  are  "close"  in 
size,  they  are  O(-),  and  this  decrease  is  O(-),  resulting  in  a  very  slow 
convergence.  We  would  expect  the  move  to  the  front  rule  to  take  Q(n) 
time  to  get  very  close  to  steady  state,  assuming  ft(n)  high  probability 
keys.  The  transposition  rule  should  require  fi(n  )  since  each  key  must 
move  fi(n)  steps  to  get  near  the  top. 

To  begin  the  analysis,  we  determine  the  expected  cost  of  the 
move  to  the  front  rule  as  a  function  of  time. 


Theorem:  Given  keys  k, ,kp,...,k  having  request  probabilities 

p, ,p2,...,p  ,  the  expected  cost  of  accessing  a  list  being  modified  by 

the  move  to  front  rule  after  t  requests  is 

2 
v        P.- P.-  y         (P,--P.j)  t 

1+2        I        JL_J_    +        l        3 J M-d  -d  ) 

U1<jsn  p^p.        U1<j*n  2(p..+p.)  v    pi  yy 

Proof:     We  begin  by  deriving  Prob  (k.   is  ahead  of  k.  at  time  t).     There 
are  two  different  situations  that  could  cause  k.  to  be  ahead  of  k.. 
First,  neither  k.   nor  k.  was  requested  in  t  requests  and  k.  was 
initially  ahead  of  k..     The  probability  of  this  is  -^(l-p.-pJ   .     Second 
k!s  most  recent  request  was  at  time  m  >  1,  and  k.  was  not  requested 
after  time  m.     The  probability  for  this  is 
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I     (l^.-p.)^  P1  =  V   (l-PrPi)m  p. 
m=l  1     J  1       m=0  1     J         1 

■  (vpt»  -  (1-pr»/  ^ 


Adding  these  gives 


P(k.  ahead  of  k.  at  time  t)  = 

J 


i  (1-prpj)t  +  'p^pJ'  "  (P7P7,(1-prpj)t 

Pi  Pi"pi  t 

=  Vpt  +  Ttitt  (1-prp/ 


Then 


E(Cost)  =  1  +     I    p.  I    P(k,  ahead  of  k . ) 

i=l     v  j7i        J                     ] 

y       y     p,Pi  P^P-P.)                     t 

1       1=1  #1  P^Pj  2(p.+Pj)     (1   p1  pj} 

Y        pipi  Y         (pi"Pi) 

=    1+9  L  _L_J_     +  Z  1_J fi_D   _D    } 

l<i<j£n  p^p,  l<i<j<n  2(p.+p.)   u  pi  pj; 


D 


J 

As  t  ■*•  °°  the  last  term  vanishes  and  the  first  two  terms  give 
us  the  steady  state  cost.  The  last  term  then  measures  the  speed  of 
convergence. 

Determining  the  expected  cost  of  the  transposition  rule  as  a 
function  of  time  is  much  more  difficult  and  has  been  determined  only 
in  some  simple  cases.  We  will  consider  two  cases  which  will  serve  to 
illustrate  the  difference  in  the  rates  of  convergence  of  the  two  rules, 
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In  the  first  case,  we  assume  that  one  key  (k,)  has  probability 
one  and  the  other  n-1  have  probability  zero.  Using  the  move  to  the 
front  rule,  the  first  request  will  be  for  k, ,  which  will  Immediately 
move  to  the  front  and  remain  there.  The  cost  is  then 


n+1 


for    t  =  0, 


1 


for 


t  >  0. 


Using  the  transposition  rule,  k,  is  equally  likely  to  start  in  any 
position  and  will  move  up  one  position  at  a  time  until  1t  reaches  the 
top.  We  will  then  have 


r 

t-n 

n 


,  t  <  n  -  2 


Prob(k,  is  in  position  1  at  time  t)=i 


1,   t  >  n  -  1 


v. 


Prob(k,  is  in  position  i^l  at  time  t)= 


-  if  n  -  i  <  t 
n         = 


0  otherwise 


For  t  <  n  -  2,  the  expected  cost  is: 

n-t 


i. SU  £  i.(l).i  ♦**%**- !♦!(¥» 


For  t  >  n  -  1  the  expected  cost  is  1  since  k,  must  have  reached  the  top 

An  interesting  statistic  to  compute  from  these  time  varying 
costs  is  the  overwork.  This  is  defined  as  the  area  between  the  cost 
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curve  and  its  asymptote.  (See  Figure  2.2.1)  The  overwork  measures 
how  quickly  the  cost  converges  to  Its  asymptote.  Also,  since  the  area 
under  a  cost  curve  measures  the  total  cost,  the  overwork  represents  the 
total  number  of  comparisons  we  do  in  addition  to  the  asymptotic  cost. 

The  overwork  can  be  determined  by  summing  the  time  varying  part 
of  the  equation  for  the  cost.  The  overwork  for  the  move  to  the  front 
rule  is  then 

r      r     (Pi-Pi)2  ,  t       ,.     (Pi-Pi)2 

£  I         2(p+p   )   (Hyp/8         I  ,       3   %2- 

t=0  l<i<j<n  api  pj;  ]     J  l<1<j<n  2(p  +p  Y 

This  formula  allows  us  to  get  a  simple  upper  bound  on  the  overwork. 


Since 


p.-p . 


p.+p . 


<  1  we  have 


y     1  <p-£h.)2    "     y     1 «  aLndl , 

l<i<j^n    Ki  Kj     lsi<J<n 


so  the  overwork  is  0(n  )  for  the  move  to  the  front  rule.  This  bound  is 
interesting  since  it  tells  us  how  significant  the  overwork  can  he 
compared  to  the  asymptotic  cost.  For  example,  after  n^n7  *  key  requests, 
the  overwork  is  at  most  one  comparison  per  request,  and  the  asymptotic 
cost  is  a  good  approximation  to  the  amount  of  work  we  have  done, 

For  the  distribution  just  considered,  the  overwork  in  the 
move  to  the  front  rule  is  -»-,  The  overwork  in  the  transposition  rule 
is 


28 


cost 


time 


The  overwork  is  the  area  between  the  cost 
curve  and  its  asymptote. 


Figure  2.2.1  The  definition  of  the  overwork. 
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nv2  1  ,n-tx  _  1  5  ,t*   1  ,n+K   n2-l 
tlQ   n  (  2  )  "  n  l2  V  =   n  {  3  >  =  "IP 

So  we  see  that  the  move  to  the  front  rule  does  overwork  ft(n)  and  the 

2 
transposition  rule  does  fi(n  ),  and  hence  the  move  to  the  front  rule 

approaches  its  asymptote  more  quickly.  Also  note  that  the  move  to  the 

front  rule  converges  in  1  request,  but  the  transposition  rule  requires 

n-2  requests. 

We  now  consider  a  slightly  more  complicated  case.  Suppose 
there  are  n-1  elements  of  probability  — y,  and  one  element  (k, )  of 
probability  zero.  This  is  not  equivalent  to  the  previous  case  in  which 
k,  was  moved  (unless  it  was  at  the  top)  after  each  request.  Now  k,  may 
or  may  not  move  depending  on  which  of  the  n-1  elements  with  nonzero 
probability  is  accessed. 

The  overwork  for  the  move  to  the  front  rule  in  this  case  is 
-s-  .  This  can  be  obtained  by  substituting  the  p.  in  the  overwork 
formula. 

In  order  to  determine  the  overwork  in  the  case  of  the  trans- 
position rule,  we  calculate  P(k,t),  the  probability  k,  is  in  position  k 
at  time  t.  Notice  that  k,  will  move  down  only  when  the  key  directly 
under  it  is  accessed  and  that  this  occurs  with  probability  — y  .  We  then 


k 
have  for  k  <  n:  P(k,t)  =  4i  P^ob(k1  initially  in  position  i)»Prob  (k1 

moves  down  k-i  positions  in  t  time  steps). 
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k     1  t  1     k_1     n  2  t-k+1 

-  1J1  n  '  (k-i}  fjCTJ       (n=TJ 


-  1  t£f**  %  c?)  <nV 


For  k  =  n  (probability  k,  is  in  the  last  position) 

n 

P(n,t)  =  I  Prob(k,  initially  in  position  i)  Prob(k,  moves  down  n-i 

1*1      '  ' 

positions  in  <  t  steps). 

n  i   t    f        ,  j    9   t-j 
1=1  n  j-n-1  J   n  ' 

n  '   n  j=0  J  (n-2)J 

Now  if  k  is  the  position  of  k, ,  then 

*        v    _L  •       t  ?    J_\      J_      (n+l)n        k 
"   J=l  n_1  J  '      j=l  n        "  n_1  "     (       }  "  "^ 


n 

J  = 
j7k 


Then 


I     1P(1.t) 

E(cost)  =  (?+1>n  -  iiil  =   (?+1>n  -  i=1        

ticostj       2(n-l)       n-T^      2(n-l)  fPI 


n  .       1         #n-2v        r      J /tx   ,n-j\ 
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This  gives  us  the  expected  cost  as  a  function  of  time.  The  overwork  1s 
then: 

X  hiftt  <£f>  X  id?  (j'  (n2J) 

=^X^(n2J)Xi^)  § 

By  use  of  a  Taylor  Expansion,  we  can  verify 

i     °°   i 
(l-x)k+1   1=0   k 

n2-l 
Using  this  substitution,  we  can  show  the  overwork  equals  — g-; 

which  also  is  the  same  as  our  earlier  model.  Again,  the  move  to  the 

front  rule  does  fi(n)  overwork,  and  the  transposition  rule  does  ft(n  ) 

overwork. 

For  this  case,  it  is  also  possible  to  obtain  simple  bounds  on 
the  residual  cost,  i.e.  the  difference  between  the  cost  and  the 
asymptotic  cost.  By  substituting  the  p.  into  the  equation  for  the  cost 
of  the  move  to  the  front  rule,  we  get  C0STMTF  =  ^  +  i(^4")  •  The 
residual  cost  is  then  i(~?)  ~  \  e"t/^n"2^for  large  n. 

For  the  transposition  rule,  note  that  if  t  <  n  -  2,  all  the 
terms  of  a  binomial  expansion  are  present  in  the  time-varying  cost,  and 

the  residual  cost  equals  j-X^    t2-t(2n2-4n+3)+n(n-l)3  s  t2-2n2t+n4 

2n(n"1)        (n-1)2  2n4 

for  large  n. 
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If  t  >  n-2  we  can  add  terms  with  n-2  <  j  <   t  to  complete 
the  binomial  expansion  and  obtain  this  result  as  an  upper  bound  on  the 
convergence.  This  bound,  however,  gets  progressively  poorer  as  t 
becomes  larger  since  we  must  add  more  and  more  terms.  In  fact,  the 
bound  goes  to  infinity  as  t  -*■  °°. 

These  two  bounds  illustrate  the  difference  rates  of  convergence. 
Initially  (when  t  £  n-2)  the  move  to  the  front  rule  converges  exponen- 
tially, and  the  transposition  rule  converges  quadratically,  so  the  move 
to  the  front  rule  converges  considerably  more  quickly.  To  give  an  idea 
of  the  magnitude  of  these  bounds,  for  t  =  n-2,  Residual  CostMTp  ~  .1839 

o  1      n  /n  9    1  1  ^ 

and  at  t  =  n  Residual  CostMTF  =  ^-(e)  '         =  j{^)    .  On  the  other  hand, 

1  1         2 

Residual  CostTO  z  -~   at  t  =  n-2  and  Residual  CostTO  <  *r  for  t  ~  n  . 
TR   2  TR  =  2n 

In  general,  the  transposition  rule  will  converge  exponentially, 
much  more  slowly  than  the  move  to  front  rule.  The  convergence  of  the 
cost,  which  is  c,A,  +...+C  A   (see  appendix),  is  mainly  determined  by  the 
size  of  the  eigenvalues  with  largest  modulus.  These  are  much  larger  in  the 

case  of  the  transposition  rule.  As  a  comparison,  for  Zipf's  Law  with 
3  elements,  the  eigenvalues  which  have  nonzero  c.  are  (l-p.-p.)  for  the 
move  to  the  front  rule  (.545,  .273,  .182).  For  the  transposition  rule, 
these  can  be  numerically  calculated  as:  .710,  .576,  -.344,  .175,  -.117. 
Indeed,  the  "major"  eigenvalues  of  the  transposition  rule  are  larger, 
and  slower  convergence  will  result. 

The  overwork  has  been  numerically  calculated  for  more  compli- 
cated distributions.  We  have  already  determined  a  simple  form  for  the 
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overwork  in  the  move  to  the  front  rule  and  can  just  put  1n  the  particular 
distribution.  For  the  transposition  rule,  there  1s  no  known  simple  form. 
We  can  closely  approximate  the  overwork  by  letting  xQ  =  ('nT»---»7T)  De 
the  initial  distribution  over  the  states  of  the  Markov  chain.  Then 
x~0P  1s  the  distribution  after  t  requests.  From  this,  we  can  calculate 
the  expected  cost  at  any  t.  The  asymptotic  cost  (Acost)  can  be  determined 
directly  from  the  steady  state  probabilities,  or  approximated  by  the  cost 
of  XqP  for  large  t.  The  overwork  is  then 

-  i         t        1 
I     [cost(xnP  )-Acost]  s  I     [cost(xnP  )-Acost], 

i=0       u  i=0       u 

for  sufficiently  large  t.  This  is  the  quantity  we  calculate.  The  over- 
work for  several  distributions  is  shown  in  Table  2.2.2. 

By  analyzing  the  differences  between  successive  values  of  the 

overwork  in  the  case  of  Zipf's  Law,  we  can  conclude  that  the  trans- 

3 
position  rule  does  ft(n  )  overwork  while  the  move  to  front  rule  does  only 

o 
n(n  ).  Thus,  for  a  more  complicated  distribution  the  transposition  rule 

does  much  more  overwork. 

In  fact,  assuming  Zipf's  Law,  we  can  derive  an  exact  form  for 
the  move  to  front  rule  overwork  and  prove  1t  is  ft(n  )  and  thus  the  bound 
0f  nln~')  is  of  the  right  order. 

Theorem:  Assume  that  the  key  request  probabilities  satisfy  Zipf's  Law. 
Then  the  overwork  for  the  move  to  front  rule  with  a  list  of  n  elements  is 
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Table  2.2.2 
The  Overwork  for  Various  Distributions 


OVERWORK  FOR  MOVE  TO  FRONT  RULE 

ENGLISH 

LETTERS                52.7469 

ENGLISH 

WORDS                 122.3576 

OVERWORK 

FOR 

GEOMETRIC  DISTRIBUTION  WITH  N  ELEMENTS 

N       MOVE 

TO  FRONT  RULE 

3 

0.291 1 

4 

0.8291 

5 

1.7564 

6 

3.1250 

7 

4.9612 

8 

7.2860 

9 

1C. 1011 

10 

13.4123 

11 

17.2216 

12 

21.5299 

13 

26.3377 

14 

3  1.6452 

15 

37.4527 

16 

43.7600 

17 

50.5674 

18 

57.8747 

19 

65.6820 

20 

73.9893 

OVERWORK 

FOR 

ZIPF'S  LAW  WITH  N  ELEMENTS 

N       MOVE 

TC  FRONT  RULE       TRANSPOSITION  RULE 

3 

0.2006                      0.4579 

4 

0.4463                       1.6503 

5 

0.7978                      3.9793 

6 

1.2576                      7.7514 

7 

1.8272                    13.3005 

8 

2.5076 

9 

3.2994 

10 

4.2031 

11 

5.2189 

12 

C.3473 

13 

7.5882 

14 

8.9420 

15 

10.4087 

16 

1 1.9884 

17 

13.6812 

18 

1 5.4871 

19 

17.4063 

20 

19.4387 
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5rT 


■(^W^/11^^-^ 


where 

Hi2'   -     I    \. 

n       i=i  r 

Asymptotically,  this  is  (|  -  ln2)n2  ~  .057n2. 

Proof:     Substituting  p.  =  -Jr-  Into  the  overwork  formula  gives 
1        lHn 

(J-  x>2 

i  ™„       JH„'         .  ..    .,2 

|      I      _2 L_.J      j      UxiL 

U1<j<n      ,  ,2  Ui<j*n  (i+j) 


Since  the  summand  is  symmetric  in  i  and  j,  we  have 


i 


iblil  + 


I 


ihu: 


U1<j<n  (1+j)        lsj<1*n  (1+j) 


-i  ?  ■?  ^t 

4 1=1  ih  (T^7 


Now,  making  the  substitution  k  =  1+j,  we  get 


*  k=2  j=l       r  H  k=n+l  j=k-n      IT 


(1) 


The  first  term  in  equation  (1)  equals 


l  n 

1  I 

4  k=2 


k-1    .     k-1    ,       k-1  ? 
j=l    K  j=l    IT  j=l 
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l      n 

I    I 
4  k=2 


.    ■       4       (k-1)k        4         (k-1)k(2k-1) 
k"]  "  k  '  ^T-   +  7  '  J — <T — L 


l     n 
q  k=2 


£-   1   fir 
3       '       3k 


algtU-}-  <„.„♦§<„,.,) 


n         5n      Ju 
24"  "  24  +  G^n 


(2) 


The  second  term  in  equation  (1)  equals 


1     2n 
4  k=n+l 


n  A        n  -  n     ? 

j=R-n  j=k-n        k      j=k-n 


2n 


*  k=n+l 


(n.   (k.n)  +1)  _4(n[nHl_   (k-n-1)(k-n) 


+  4_  ,n(n+l)(2n+l)  .  (k-n-1 )(k-n)(2(k-n)-1 ) ) 
k2<  6  6 


Collecting  all  equal   powers  of  k  gives 


1     2n 
4  k=n+l 


.  k  +  (2n+1)  _  ^2^4)  +  ^  (nln+ll^ntli) 


3   ■    ^„.,,       kV.M   .™^j  +  -j  v  3 


_  1_.   (2nJ|n±Ti  .  nJn+Uj  +  (2n+1)n 


(4n2+4n4)(H2n-Hn) 


,  4n(n+1)(2n+1)   ,,.(2)       (2) 

3  lM2n       Mn     ; 
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=  fn2+fjn-  (n2+n4)(H2n-Hn) 
+  niniHIMli  (H(2).  H(2)) 

Adding  (2)  and  (3)  gives  the  total  result, 


1|n2-(n2  +  n+l)(H2n-Hn)+lHn 
+n(n+l)(2n+1)  (H^-H<2>). 


(3) 


To  determine  the  asymptotic  behavior,  note  that  H  ~  in  n, 

so  H2  -  H  ~  ln2n  -  In  n  =  ln2,  so  the  second  term  is  asymptotically 

2  2 

n  Tn2.  The  third  term  is  0(log  n)  and  is  dominated  by  the  n  terms. 

2n   t 

(2^      (2)  =    y    — 

Finally,  we  need  to  approximate  Hi  '   -  H*  '       \m-\   ^  ' 
Since  the  summand  is  a  decreasing  function,  we  can  bound  it  using  the 
following  relation: 


b+1        b 

f(x)dx  *  I   f(D  *  f(a)  + 


i=a 


f(x)dx 


Substituting  a  =  n+1 ,  b  =  2n  and  f(x)  =  -*-  gives 

x 


2n+l 


2n 


T  *       I   _  TF  *  .   ,v2 


n+1 


:2   i=n+l  i2  '  (n+1)' 


2n 


n+1 


dx 

T 

x 


(2n+U(«M)  *  ,  Jn+1  7 


r\c  +   6n  -  1 


2n(n+l)' 


1      _2 
Since  both  the  upper  and  lower  bounds  equal  j-  +  0(n  ), 


we  have 
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2n   1    1      -2 
I      —k  =  «-  +  0(n  )  and  the  fourth  term  1n  (4)  approaches 

i=n+l   r 

1  2 

2  n   .     Hence,  the  asymptotic  value  for  (4)   1s 

1^  n2  -  n2ln2  +  |  n2  =   (|  -   ln2)n2  s   .057n2  rn 

We  can  get  a  graphic  idea  of  the  difference  1n  convergence 
from  Figure  2.6.1  and  Table  2.6.1  in  Section  2.6.  These  show  the  cost 
of  accessing  a  list  ordered  by  the  two  rules  as  a  function  of  time  and 
compare  them  with  the  frequency  count  rule  (see  Section  2.6)  which  is 
optimal.  From  graphs  like  these,  it  is  interesting  to  calculate  the 
smallest  number  of  requests  for  which  it  is  better  to  use  the  trans- 
position rule  (See  Table  2.2.3).  Note  that  the  value  we  are  really 
interested  in  is  not  the  point  where  the  two  cost  curves  cross,  but 
the  point  where  the  integrals  of  the  two  curves  cross.  This  is  because 
we  want  the  rule  that  does  the  least  total  work. 

The  slope  of  the  cost  crossover  in  Table  2.2.3  increases,  so 
it  is  super  linear  and  may  be  about  ft(n  log  n).  The  integral  crossover 
appears  to  be  fi(n  ).  We  can  also  get  an  estimate  of  the  integral  cross- 
over as  follows:  If  we  assume  all  the  overwork  has  been  done  by  time 
t,  the  integral  crossover  time,  then  the  cost  integral  for  the  move  to 
front  rule  is  t  times  the  asymptotic  cost  (AS  )  plus  the  overwork  (OV  ), 
and  similarly  for  the  transposition  rule.  Since  we  are  at  the  point 
where  these  integrals  cross, 


t  •  ASm  +  0Vm  =  t  •  ASTD  +  0VTD 
mm        TR     TR 


ovtd  -  0Vm 

.     TR m 

z       AS  -  ASTD 
m     TR 
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Table  2.2.3 
Cost  Crossover  and  Integral  Crossover  Times 


n 

Cost  Crossover 

3 

3 

4 

5 

5 

7 

6 

10 

7 

13 

10 

22 

20 

75 

Integral  Crossover 


6 

10 
14 
20 
27 
50 
212 


Points  where  the  cost  and  integral  of  the  cost  for  the 
transposition  rule  become  less  than  that  of  the  move 
to  front  rule,  for  an  n-element  list  with  Zipf's  Law 
as  the  probability  distribution. 
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Earlier  in  this  section,  we  found  0VTR  =  ft(n3)  and  0V     =  fi(n2).     Since 
the  asymptotic  costs  are  bounded  within  twice  the  optimal   cost 

(which  is  ,"  -),  AS     -  ASTD  =  ftU   "  J  and  hence  we  get  t  =  n(n     In  n), 
inn  m  i  k  inn 

which  is  slightly  larger  than  shown  in  Table  2.2.3. 

In  summary,  though  the  transposition  rule  has  lower  asymptotic 

cost  than  the  move  to  front  rule,  it  converges  to  that  cost  much  more 

2 
slowly,  and,  in  fact,  for  Zipf's  law,  it  will  require  fi(n  )  key  requests 

before  it  becomes  more  economical  to  use  the  transposition  rule. 

2.3  Other  Permutation  Rules 

We  previously  defined  the  idea  of  a  permutation  rule,  where  a 
permutation  t.  is  perfomed  on  the  list  when  the  key  in  location  i  is 
requested.  So  far,  we  have  only  considered  two  such  rules:  the  move  to 
front  rule  and  the  transposition  rule.  There  are  a  total  of  (n!)n 
possible  such  rules,  but  most  will  just  senselessly  jumble  the  list, 
resulting  in  no  decrease  in  cost. 

Let  us  think  intuitively  about  what  a  "sensible"  rule  must  be 
like.  We  will  see  that  a  sensible  rule  should  move  the  requested  key 
up  in  the  list  by  a  certain  amount  (which  may  depend  on  the  location  of 
the  requested  key).  This  is  the  only  good  way  to  use  the  information 
that  this  key,  having  been  requested,  should  have  higher  probability. 
Any  permutation  not  of  this  form  can  be  viewed  as  performing  first  a  sen- 
sible permutation  and  then  a  permutation  that  leaves  the  requested  element 
alone.  This  second  permutation  will  only  increase  the  disorder  of  the 
list  since  no  additional  information  has  been  given  on  these  keys,  and 
permuting  them  will  work  against  the  order  we  are  trying  to  create. 
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We  consider  the  following  sort  of  sensible  rule  which 
moves  the  requested  key  k  position  ahead  for  some  fixed  k.  Another 
type  of  rule  that  should  behave  similarly  1s  where  the  requested  key 
1s  moved  some  fixed  fraction  of  the  distance  to  the  top. 

It  can  be  seen  from  Figure  2.3.1  (due  to  Rivest  [2])  and 
Table  2.3.1  that  as  the  distance  the  requested  key  moves  is  Increased, 
the  asymptotic  cost  increases  and  the  rules  converge  more  quickly, 
forming  a  spectrum  of  rules,  ranging  from  the  move  ahead  1  (transposition) 
rule  at  one  end  to  the  move  ahead  n-1  (move  to  front)  rule  at  the  other. 

2.4  A  Hybrid  Rule 

We  can  get  a  rule  that  1s  superior  to  any  of  those  we  have 
considered  so  far  by  relaxing  the  restraint  that  the  rule  cannot  vary 
with  respect  to  time.  A  hybrid  rule  can  be  envisioned  that  moves  keys 
to  the  front  for  some  initial  period  of  time,  then  switches  and  begins 
transposing.  Such  a  rule  will  enjoy  the  advantages  of  both  rules. 
Initially,  it  will  move  keys  to  the  front  and  will  therefore  converge 
quite  rapidly.  Asymptotically,  it  will  behave  like  the  transposition 
rule  and  therefore  will  have  a  low  asymptotic  cost. 

The  question  is  when  we  should  switch  rules.  To  help  answer 
this  question,  a  simulation  was  run  using  Zipf's  Law  for  the  key  request 
probabilities.  Each  trial  of  the  simulation  used  the  move  to  front 
rule  until  the  expected  decrease  from  using  the  transposition  rule 
became  larger  than  that  of  the  move  to  front  rule.  The  number  of 
requests  required  for  this  to  occur  is  an  approximation  to  the  correct 
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o 
E 
•=     120 


1.10 


1.00 


Move  to   front  rule 


Tronsposit  ion  rule 


A5         A6 


This  figure,  due  to  Rivest  [2],  compares 
the  cost  of  different  move  ahead  k  rules 
(A.  refers  to  the  move  ahead  i  rule)  for 
a  list  of  seven  elements  whose  probabil- 
ities are  given  by  Zipf's  Law. 


Figure  2.3.1 


Asymptotic  comparison  of  the  move  ahead 
k  rules. 
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Table  2.3.1 
Comparison  of  the  Convergence  of  the  Move  Ahead  k  Rules 


Lowest  Total  Cost 

0  -  5 

6  -  8 

9  -13 
14  -38 
39  -  °° 


The  results  of  a  simulation  using  a  list  of  6 
elements  whose  probabilities  are  given  by  Zipf's 
Law  show  the  time  interval  for  which  each  move 
ahead  k  rule  has  lowest  cost  and  lowest  total 
cost  (the  total  cost  is  the  cost  summed  over  all 
previous  requests). 


k 

Lowest  Cost 

5 

0  -  3 

4 

4 

3 

5  -  7 

2 

8  -15 

1 

16  -  co 
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time  to  switch.  These  times  were  then  averaged  over  all  trials  to  give 
the  results  shown  in  Table  2.4.1.  The  results  of  this  simulation  In- 
dicate .268n  +  .980  as  the  best  time  to  switch  rules.  This  time,  of 
course,  depends  on  the  request  probabilities,  but  we  would  not  expect 
it  to  vary  too  much  for  different  distributions.  Furthermore  the  choice 
is  not  too  critical.  Since  the  transposition  rule  converges  so  slowly, 
little  is  lost  if  we  use  the  move  to  front  rule  for  too  long.  We  need 
only  make  sure  that  our  choice  is  large  enough  to  have  the  move  front 
rule  be  close  to  its  asymptote.  We  would  then  switch  after  .5n 
requests,  to  make  sure  we  had  used  the  move  to  front  rule  long  enough 
to  significantly  reduce  the  cost. 

Another  method  would  be  to  estimate  our  position  on  the  cost 
curve  by  counting  the  number  of  compares  we  require  and  averaging  over 
a  period  of  time.  Once  this  estimate  stops  decreasing,  we  suspect  that 
we  are  in  the  flat  part  of  cost  curve,  and  we  switch  to  the  transposition 
rule.  This  rule  has  the  overhead  of  counting  the  number  of  comparisons, 
In  addition,  we  must  be  careful  not  to  average  over  too  short  a  period, 
or  we  may  switch  too  soon. 

This  rule  is  best  employed  when  we  expect  an  intermediate 
number  of  requests.  If  few  (0(n  ))  requests  are  expected,  then  the 
move  to  front  rule  is  used.  A  great  number  suggests  the  transposition 
rule.  An  intermediate  number  means  that  both  of  the  good  features  of 
the  hybrid  (fast  convergence  and  low  cost)  will  be  valuable  and  the 
overhead  incurred  by  using  this  rule  will  be  worthwhile. 
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Table  2.4.1 
Best  Times  to  Switch  Rules 


n  Average  Switch  Time 

3  1.90 

4  2.20 

5  2.29 

6  2.52 

7  2.84 

8  2.94 

9  3.35 
10  3.55 
20  6.608 
30  8.924 


Simulation  showing  the  average  best  time  to 
switch  from  the  move  to  front  rule  to  the 
transposition  rule.  The  probability  distri- 
bution is  Zipf's  Law  over  n  elements. 
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2.5  The  First  Request  Rule 

The  first  request  rule  1s  defined  as  follows:  the  first  time 
a  key  is  requested,  it  is  moved  up  1n  the  11st  until  1t  comes  to  the 
top  or  a  previously  requested  key.  After  that,  it  1s  not  moved.  Note 
that  the  keys  occur  in  the  11st  1n  order  of  their  first  request.  After 
all  keys  have  been  requested,  the  ordering  obtained  is  the  same  as  if 
the  keys  had  not  been  known  a  priori,  and  the  list  had  been  built  by 
inserting  a  "new"  key  (one  that  had  been  requested  for  the  first  time) 
at  the  end  of  the  list. 

The  following  theorem  characterizes  the  performance  of  this 
rule. 

Theorem:  Given  any  initial  list,  the  probability  of  obtaining  a  given 
final  list  after  any  number  of  requests  is  the  same  for  the  move  to 
front  and  first  request  rules. 

Proof:  Consider  any  sequence  of  requests  r, ...r.  as  inputs  to  the  move 
to  front  rule,  and  the  reverse  sequence  r....r,  as  inputs  to  the  first 
request  rule.  Note  that  these  two  sequences  have  the  same  probability. 
Suppose  that  both  rules  start  with  the  same  list.  We  now  show  that 
these  two  sequences  produce  the  same  ordering.  Consider  any  two  keys 
k.  and  kj.  If  neither  is  requested,  both  rules  will  leave  the  initial 
order  unchanged,  and  k.  and  k.  will  be  ordered  the  same  in  the  two 
final  lists.  If  only  one  (say  k. )  is  requested,  then  both  rules  will 
have  k.  ahead  of  k.  in  the  final  list.  If  both  are  requested  (say  k. 
is  requested  after  k.  in  the  sequence  r,  ...r.),  then  k.  will  be  ahead 
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of  k.  in  the  move  to  front  list.  Since  k.  is  requested  before  k.  in 

the  sequence  r.  .  ..r, ,  k.  will  also  be  ahead  of  k.  in  the  first  request 

list,  hence  the  orderings  1n  the  final  list  will  again  be  the  same. 

In  any  case,  k.  is  ahead  of  k.  in  one  list  if  and  only  if  it  is  ahead 

of  k.  in  the  other.  Hence  the  two  lists  must  have  the  same  ordering. 
J 

Now  consider  any  list.  For  each  sequence  of  requests  that 
will  produce  this  list  using  one  rule,  there  exists  a  sequence  of 
equal  probability  that  will  produce  this  same  list  using  the  other 
rule.  Hence  the  probability  for  either  rule  to  produce  this  list  must 
be  equal.  ~\ 

This  theorem  is  easily  extended  to  hold  for  a  probability 
distribution  over  initial  lists  since  the  two  rules  will  behave 
identically  for  each  initial  list.  Also,  it  implies  that  the  cost  of  the 
first  request  rule  at  any  time  will  equal  the  cost  of  the  move  to  front 
rule.  Therefore  all  the  previous  results  concerning  the  move  to  front 
rule  apply  to  the  first  request  rule. 

Suppose  the  keys  were  not  known  a  priori  and  the  list  was 
constructed  by  inserting  a  "new"  key  at  the  end  of  the  list.  Clearly, 
the  asymptotic  distribution  will  be  that  of  the  first  request  rule. 
This  theorem  tells  us  that  if  the  initial  list  was  constructed  in  this 
manner,  using  the  move  to  the  front  rule  will  not  decrease  the  cost 
(since  the  Markov  chain  will  be  in  steady  state). 

The  first  request  rule  differs  from  the  move  to  front  rule  in 
two  important  ways.  First,  since  each  key  is  moved  only  once,  it  is 
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cheaper  to  execute  than  the  move  to  front  rule.  Second,  since  the 
list  converges  to  a  specific  ordering  (which  may  have  very  high  cost), 
the  variance  of  the  cost  is  much  higher  than  that  of  the  move  to  front 
rule. 

The  first  request  rule  can  be  modeled  by  a  Markov  chain  with 
(n+l)n!  states.  For  each  of  the  n!  orderings  of  the  list,  the  chain 
can  be  in  n+1  different  states,  depending  on  whether  0,1,...,  or  n 
different  keys  have  been  requested.  Unlike  previous  chains,  this 
chain  is  reducible  (see  appendix).  Once  we  reach  a  state  in  which  all 
n  keys  have  been  requested,  we  are  "trapped"  and  cannot  leave  this  state, 

On  the  other  hand,  an  irreducible  chain  cannot  get  trapped 
and  must  divide  its  time  among  all  states  that  have  nonzero  steady  state 
probability.  In  fact,  the  ergodic  theorem  tells  us  that  if  a  state  has 
steady  state  probability  p,  the  chain  will  spend  a  fraction  of  its 
time  equal  to  p  in  this  state. 

We  are  now  in  a  position  to  talk  about  the  variance  of  the 

costs  of  these  two  rules.  If  we  let  c.  be  a  random  variable  equal  to 

the  cost  of  the  state  the  chain  is  in  at  time  i,  E(c),  VAR(c.)  and 
c,+c2+...+c. 

E( * )  are  the  same  for  both  rules.  However  the  variance  of  the 

c,+...+c 
cost  averaged  over  some  time  period  [VAR(— -)]  is  much  greater  for 

the  first  request  rule.  The  fact  that  the  move  to  front  rule  is 

lim    c-1+...+c 
irreducible  implies    VAR( -)  =  0  (see  appendix).  However, 

for  large  n,  c  =  c  ,  using  the  first  request  rule  (since  the  chain 

has  reached  a  final  state),  and -~  c  .  Therefore  the  variance 
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of  the  average  cost  is  VAFUc  )  >  0.  (An  expression  for  this  variance 
can  be  found  in  McCabe  [10].) 

We  can  use  the  first  request  rule  to  form  a  hybrid  rule  with 
the  transposition  rule  as  follows:  When  a  key  is  first  requested,  we 
use  the  first  request  rule  and  move  the  key  up  until  a  previously 
requested  key  is  encountered.  When  the  key  is  subsequently  requested, 
we  use  the  transposition  rule  to  promote  it  in  the  list. 

The  performance  of  this  hybrid  is  better  then  the  move  to 
front/transpose  hybrid.  The  only  requests  handled  by  the  transposition 
rule  are  second  and  subsequent  requests.  Hence  the  initial  list  that 
the  transposition  part  of  the  hybrid  "sees"  is  a  list  ordered  by  the 
first  request  rule,  which,  as  we  have  seen,  is  the  move  to  front  rule 
(or  first  request  rule)  steady  state.  Hence  the  transposition  rule 
"starts"  from  the  move  to  front  rule  steady  state.  This  is  an  improve- 
ment over  using  the  move  to  front  rule  initially  since  then  the  steady 
state  is  never  reached.  In  addition,  this  hybrid  will  reduce  the  cost 
more  quickly  than  the  first  request  rule  because  it  does  a  cost  re- 
ducing transposition  on  second  and  subsequent  requests  of  a  key,  while 
the  first  request  rule  alone  does  nothing.  This  hybrid  also  has  the 
desirable  feature  that  no  guesswork  need  be  done  as  to  when  to  switch 
rules.  This  choice  is  performed  automatically  by  the  algorithm. 

2.6  Frequency  Count  Rule 

Perhaps  the  most  natural  way  to  cause  high  frequency  keys  to 
move  to  higher  positions  in  the  data  structure  would  be  to  keep  count 
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of  how  many  times  each  key  has  been  requested.  If  we  assume  the  request 
probabilities  are  constant  with  respect  to  time  and  then  keep  the  keys 
sorted  according  to  their  frequency  counts,  high  probability  keys  will 
move  to  the  top. 

The  primary  advantage  of  this  rule  is  that  it  has  a  lower 
access  time  than  the  other  rules  we  have  considered.  In  fact,  its 
performance  is  optimal.  In  addition,  frequency  information  is  available 
for  analysis,  which  may  be  desirable,  and  the  changes  required  to 
execute  the  rule  are  quite  simple.  The  primary  disadvantage  is  that 
count  fields  must  be  kept,  requiring  extra  storage.  These  points  are 
now  considered  in  greater  detail. 

We  first  discuss  the  performance  of  this  rule.  The  following 
theorem  shows  that  it  is  asymptotically  optimal. 

Theorem :  As  the  number  of  requests,  t  ■*  °°,  a  list  ordered  by  the 
frequency  count  rule  approaches  the  optimal  ordering. 

Proof:  If  two  keys  k.  and  k.  have  probabilities  p.  and  p.  with  p.  >  p., 
the  probability  that  k.  is  ahead  of  k.  after  t  requests  approaches  one 

n 
as  t  ->  oo.  Since  E(Cost)  =  T  p. (1  +  T  Prob(k.  is  ahead  of  k. ))  we 

1=1  ]    j7i     J  1 

1  im  I   I 

have  t^oo  E(Cost)  =  I   ipi  which  is  the  optimal  cost.         [_| 

Also,  if  we  have  no  a  priori  reason  to  suspect  k.  is  more 
probable  than  k.,  this  rule  is  optimal  at  any  time. 

J 
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Theorem:  If  we  have  no  a  priori  knowledge  of  the  probability  distribu- 
tion, the  frequency  count  rule  provides  the  optimal  ordering  at  any 
time. 

Proof:  If  we  have  no  a  priori  knowledge,  then  all  distributions  of  key 
requests  must  be  considered  equally  likely,  so  if  k.  has  occurred  more 
times  than  k.,  Prob(p.  >  p.)  >  Prob(p.  <  p.),  and  an  arrangement  with 
k.  ahead  of  k.  will  have  a  lower  expected  cost.  Clearly,  the  arrangement 
with  the  lowest  expected  cost  will  be  the  one  in  which  the  keys  are 
sorted  by  frequency  count,  and  this,  of  course,  is  the  arrangement  given 
by  the  count  rule.  I — I 

A  comparison  with  previous  rules  is  given  by  Figure  2.6.1, 
which  shows  the  results  of  a  simulation  done  on  a  15-element  list  using 
Zipf's  Law.  Table  2.6.1  shows  a  simulation  for  a  list  with  100  elements. 
These  two  simulations  give  us  a  good  idea  of  the  differing  rates  of 
convergence  of  the  two  previous  rules  and  how  they  compare  to  the  optimum, 
Initially,  the  move  to  the  front  cost  decreases  nearly  as  quickly  as 
that  of  the  count  rule.  This  is  intuitively  reasonable:  Initially,  the 
count  rule  will  move  the  requested  item  close  to  the  top,  so  its  behavior 
should  be  very  close  to  the  move  to  the  front  rule.  On  the  other  hand, 
the  transposition  rule's  cost  decreases  very  slowly,  especially  on  the 
100  element  list. 

As  mentioned  before,  the  changes  required  by  this  rule  after 
each  request  are  small.  Suppose  k.  is  requested  for  the  r   time.  The 
only  change  is  to  increase  k. 's  frequency  count  from  r-1  to  r  and  move 
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02.0 


Simulation  on  a  15-element  list  using  Zipf  s  Law 
"Time"  is  measured  as  the  number  of  requests 


Figure  2.6.1  Comparison  of  Various  Rules 
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Table  2.6.1 

Another  Comparison 

TIME    VARYING    COST 

FOR 

ZIPF'S    LA*     WITH 

100    ELEMENTS 

TIME 

MOVE     TO    FRONT 

TRANSPOSIT ION 

FREQUENCY    COUNT 

0 

50.1322 

50.1322 

50.1322 

1 

47.4562 

50.0749 

47.4556 

2 

45.4688 

50.0307 

45.4608 

3 

43.4281 

49.9742 

43.4257 

4 

41.8796 

49.9329 

41.8443 

5 

40.7099 

49.8780 

40.6515 

6 

39.7503 

49.8341 

39.6460 

7 

38.5905 

» 

49.7742 

38.4920 

8 

37.8538 

49.7196 

37.6994 

9 

36.5972 

49.6666 

36.4395 

10 

36.1294 

49.6216 

35.8960 

it  ahead  of  all  keys  having  frequency  count  r-1.  We  can  easily  determine 
to  where  k.  should  move  in  the  following  manner.  During  our  search  for 
k.  we  keep  a  pointer  to  the  key  furthest  down  in  the  list  whose  count 
is  greater  than  the  key  we  are  currently  examining.  When  we  examine  k. , 
this  pointer  will  point  to  k.'s  new  location.  Note  that  after  many 
requests,  the  count  fields  will  be  widely  separated,  and  these  moves 
will  rarely  be  required. 

The  primary  disadvantage  of  this  rule  is  the  additional  storage 
required  for  the  count  fields.  The  storage  required,  however,  can  be 
reduced  using  very  simple  techniques.  From  the  updating  algorithm,  we 
can  see  that  actual  count  a  key  has  is  not  important.  What  matters  is 
the  difference  between  successive  counts,  because  this  gives  us  all  the 
information  we  need  to  keep  the  keys  ordered  with  respect  to  count.  If 
we  store  this  difference  instead  of  the  full  count,  we  will  require 
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less  storage,  since  the  rate  of  growth  of  the  difference  fields  is 
proportional  to  the  difference  in  successive  probabilities  (which  is 
small),  while  the  count  fields  grow  in  proportion  to  the  probabilities. 
Note  that  only  a  small  amount  of  work  is  required  to  update  the  dif- 
ference fields  since  after  a  request,  only  once  count  field  changes,  and 
hence  at  most  two  difference  fields  must  be  updated. 

Thus,  the  count  rule  is  a  very  attractive  rule.  Asymptotically 
it  approaches  the  optimal  ordering.  At  any  time,  it  provides  us  with 
the  list  which  has  lowest  cost,  based  on  the  requests  we  have  seen  so 
far.  The  work  required  to  update  the  list  is  also  very  small.  The 
primary  disadvantage  is  the  extra  storage  required.  However,  this 
disadvantage  can  be  reduced  by  storing  the  differences  between  successive 
counts. 

2.7  Limited  Difference  Rules 

We  now  consider  a  set  of  rules  which  limit  the  size  of  the 
difference  fields  in  the  frequency  count  rule.  Once  a  difference  field 
reaches  this  limit,  additional  requests  of  the  more  frequent  key  leave 
this  field  unchanged  (requests  to  the  other  key,  of  course,  decrease 
this  field). 

If  the  maximum  difference  is  zero,  then  the  algorithm  will 
move  a  key  to  the  front  when  it  is  requested,  and  will  perform  exactly 
like  the  move  to  front  rule.  As  the  maximum  difference  is  increased, 
the  performance  will  improve,  with  the  full  count  rule  (no  maximum  dif- 
ference) as  the  limit.  Therefore,  performance  approaches  the  optimum 
as  the  number  of  bits  is  increased. 
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To  see  how  much  the  performance  is  effected  by  the  number  of 

bits,  let  us  consider  a  list  with  only  2  elements,  having  probabilities 

a  and  b(=l-a).  If  the  maximum  difference  is  at  most  n,  then  the 

corresponding  Markov  chain  has  2n+2  states: 

A. ,  0  <  i  <  n  where  the  key  with  probability  a  is  first  in 

the  list  and  the  difference  is  i. 

B. ,  0  *  i  <  n  where  the  key  with  probability  b  is  first  in 

the  list  with  difference  i. 

It  is  easy  to  verify  that  the  steady  state  equations  are: 

A„  =  aAn  ,  +  aAn  Bn  =  bB„  ,  +  bBn 

n    n-1    n  n    n-1    n 

A-j  =  aA-j-i  +  bAi+i  Bi  =  bBi_i  +  aBi+r  2  ^  1#  ^  n-1 

A1  =  bA2  +  aAQ  +  aBQ  B,  =  aB2  +  bBQ  +  bAQ 

AQ  =  bA1  BQ  =  aB1 

n      n 
and,  in  addition,  I   A.  +  I   B.  =  1. 
1=0  1   i=0  n 

We  solve  this  system  of  equations  to  get 

h  n_1  h  n+1 

.  n-i  .  n+i 

A.  =  An(f)  B.  =  An(|)      1  <  1  <  n 

and  A_  = 


n   a[l  -  (j^n+1] 


The  cost  of  the  list  is 


(a+2b)  Prob(key  with  probability  a  is  first  in  list)  + 
(b+2a)  Prob(key  with  probability  b  is  first) 
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=  (a+2b)  I   A.  +  (b+2a)  £  B, 

1=0  n  1-0  1 

2b(b-a)(£)n  -  (b-a) 

=  (1+a)  * 


K  2n+l 


Let  us  now  suppose  that  b  >  a.  Then  the  optimal  cost  is  b+2a  =  b+a+a  = 
1+a,  which  is  the  first  term  in  the  cost  expression.  The  difference 
from  the  optimum  is  then  given  by 

2b(b-.)(|)"  -  (b-a)        , 

TTnTI  .  n+1   S1nce  a  '■ 

[(£>    -  1]      (£) 

Hence  we  see  that  the  "use"  of  adding  one  to  the  maximum  difference 

decreases  expontially  with  base  — .  This  tells  us  that  the  performance 

a 

should  be  improved  by  the  addition  of  just  a  few  bits.  However,  the 
"flatness"  of  the  distribution  (determined  by  how  close  —  is  to  one 
in  this  simple  case)  determines  how  many  bits  will  be  required.  The 
flatter  the  distribution,  the  more  bits  will  be  required  to  correctly 
distinguish  the  more  probable  elements. 

Table  2.9.1  shows  the  results  of  a  simulation  run  on  larger 
bits.  Even  using  a  small  maximum  difference  provides  nearly  optimal 
results. 

The  limited  difference  rule  lets  us  use  a  limited  amount  of 
storage,  while  providing  nearly  optimal  results.  For  a  two  element 
list,  the  cost  of  this  rule  approaches  the  optimum  exponentially  as  we 
increase  the  maximum  difference. 
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2.8  Wait  c,  Move  and  Clear  Rules 

We  now  consider  two  classes  of  rules  that  use  bit  fields  to 
store  information  about  key  requests.  The  first  class  uses  the  bit 
field  as  a  counter,  initially  zero,  that  is  incremented  by  one  each 
time  the  key  is  accessed.  Once  the  field  exceeds  to  maximum  value,  the 
key  is  moved  (using  either  the  move  to  front  or  transposition  rule)  and 
the  field  of  every  key  is  reset  to  zero.  The  cost  of  performing  this 
may  be  very  significant.  However,  if  all  fields  are  stored  in  one  area 
(instead  of  being  directly  associated  with  each  key)  we  can  set  all 
fields  to  zero  by  zeroing  a  contiguous  area  of  core,  which  may  be  done 
very  efficiently.  We  will  call  these  rules  "wait  c,  move  and  clear" 
rules,  where  c  is  the  maximum  value  of  the  field. 

A  second  class  of  rules  (discussed  in  the  next  section) 
behaves  in  a  similar  fashion,  except  that  when  a  key  is  moved,  only  its 
field  is  reset  to  zero.  These  rules  will  be  called  "wait  n  and  move" 

rules. 

In  analyzing  these  rules,  we  will  find  that  using  the  count 
fields  in  the  first  manner  will  decrease  the  asymptotic  cost  more  than 
the  second  method.  However,  the  convergence  of  the  first  method  will 
be  much  slower,  since  we  will  not  move  a  key  every  request,  and,  if  the 
maximum  difference  is  very  large,  we  will  move  keys  only  very  rarely. 

We  begin  our  analysis  of  the  wait  c,  move  and  clear  rules  with 
the  following  theorem. 
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Theorem:  Given  key  request  probabilities  p,,p2,...p  ,  the  steady 
state  probability  of  a  given  list  using  a  wait  c,  move  and  clear  rule 
is  equal  to  the  steady  state  probability  of  the  list  using  the  cor- 
responding permutation  rule  with  modified  key  request  probabilities 

P^c),  P2(c),...,pn(c), 

where 

c  c  c  c 

P,(c)  =      I     ...         I  I       ...       I 

»r°     ai-r°  vr°     v° 

(c+a1H-...tai._1  +  a1+1^...tan)! 
c!al!-Vl!Vl!-sn! 

nC+1„ai     nai'-lnai+l     /" 

Pi  Pi  •••P1_i  P1+i  ■••?„  • 

Proof:  Consider  the  sequence  of  keys  that  have  been  moved  by  the  wait 
c,  move  and  clear  rule.  We  have  assumed  that  any  two  requests  are  in- 
dependent, and  that  the  request  probabilities  are  constant  with  respect 
to  time.  Because  of  these  assumptions  and  the  fact  that  we  clear  the 
counts  after  each  move,  the  move  sequence  has  the  following  properties: 

(1)  Any  two  moves  are  independent. 

(2)  The  probability  that  the  i   move  is  a  given  key 
does  not  depend  on  i . 
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If  we  use  the  move  sequence  as  inputs  to  a  permutation  rule, 
the  resulting  list  will  be  the  same  as  one  obtained  by  inputting  the 
original  request  sequence  to  the  wait  c,  move  and  clear  rule.  We 
note  that  the  properties  of  the  move  sequence  are  exactly  those  required 
for  a  request  sequence,  so  the  inputs  to  the  permutation  rule  can  be 
thought  of  as  a  sequence  of  requests.  However,  elements  of  this  sequence 
are  not  chosen  using  the  request  probabilities,  but  using  the  probability 
that  a  key  is  moved.  The  probability  that  p.  is  moved  is  exactly  the 
p.  shown  in  the  statement  of  this  theorem. 

This  formula  is  derived  as  follows:  If  k.  was  moved,  we 
know  that  k.  has  been  requested  c+1  times,  and  that  the  last  request 
(the  one  that  caused  k.  to  be  moved)  must  have  been  for  k. .  Then  for 
j^i,  let  k.  be  requested  a.  times  (0  £  a.  £  c)  and  sum  over  all  possible 

J  J  J 

choices  for  the  a.. 

This  would  complete  the  proof  if  every  request  to  the  wait  c, 
move  and  clear  rule  caused  a  move.  This  is  not  the  case  since  we  must 
wait  after  each  move  while  the  counts  build  up.  If  this  waiting  time 
were  dependent  on  the  current  state  (as  it  will  be  for  the  wait  c  and 
move  rules),  states  with  longer  waiting  times  would  have  proportionally 
greater  probabilities.  Fortunately,  this  is  not  the  case.  After  each 
move,  the  counts  are  reset  and  hence  each  state  will  have  the  same 
expected  waiting  time.  ^] 

This  proof  demonstrates  the  reason  wait  c,  move  and  clear  rules 
outperform  permutation  rules.  In  order  to  be  moved,  a  low  probability 
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key  must  be  requested  c+1  times  before  any  other  key  1s  requested 
c+1  times.  Hence  these  are  less  likely  to  be  moved.  On  the  other 
hand,  high  probability  keys  now  have  a  proportionally  greater  chance. 
Notice,  of  course,  that  the  probability  that  a  key  is  requested 
remains  the  same;  we  are  only  being  more  selective  about  which  key  we 
move. 

Due  to  this  correspondence  between  wait  c,  move  and  clear 
rules  and  permutation  rules,  many  results  from  previous  sections  carry 
over.  Specifically: 

Corollary:  Let  keys  k, ,kp,...,k  have  request  probabilities  p,,p2,...p 
and  let  p,(c) ,. . . ,p  (c)  be  defined  as  in  the  previous  theorem.  Then 

(1)  The  asymptotic  cost  of  the  wait  c,  move  to  front  and 
clear  rule  is 


1  +  I 


W    P^CjPjU) 


(2)  For  the  wait  c,  transpose  and  clear  rule,  the  steady 

state  probability  of  any  given  ordering  (k, ...k  )  is 

n     n'i 
n  p.(c) 

i=l  n 


N 
where  N  is  a  normalizing  constant. 

(3)  The  wait  c,  transpose  and  clear  rule  has  asymptotic 
cost  less  than  or  equal  to  that  of  the  wait  c,  move  to 
front  and  clear  rule. 
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Proof:  All  result  from  replacing  p^Cthe  probability  a  key  is  moved 
by  a  permutation  rule)  by  p. (c)(the  probability  for  a  wait  c,  move  and 
clear  rule). 

As  in  the  case  of  the  limited  difference  rule,  the  performance 
approaches  the  optimum  as  c  -»•  ». 

Theorem:  As  c  -*•  °°,  the  asymptotic  costs  of  the  wait  c,  move  to  front 
and  clear  rule  and  the  wait  c,  transpose  and  clear  rule  approach  the 
optimal  cost. 

Proof:  We  first  examine  the  wait  c,  move  to  front  and  clear  rule. 
Consider  the  probability  that  k.  is  ahead  of  k.  in  the  list.  This  will 
be  the  case  if  any  only  if  k.  was  moved  at  the  most  recent  time  when 
either  k.  or  k.  was  moved  (i.e.  k.  was  the  most  recently  moved  of  k. 
and  k.).  Thus,  the  probability  is  Prob(k_-  was  moved   k.  or  k.  was 
moved).  This  equals  the  probability  that  k.  was  requested  c+1  times 
before  k.  was  requested  c+1  times.  By  the  law  of  large  numbers,  this 
approaches  1  if  p.  >  p.  and  0  if  p.  <  p..  Hence  the  expected  cost 
which  equals 


1  +  I  p.  Prob(k.  ahead  of  k.) 
W  J        1 


approaches 


1  +  I  PjO-U  =  I  1pr 
i  i 


the  optimal   cost. 
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By  (3)  of  the  previous  corollary,  the  wait  c,  transpose  and 
clear  rule  has  cost  less  than  or  equal  to  that  of  the  wait  c,  move  to 
front  and  clear  rule,  so  1t  also  approaches  the  optimum. 

So  both  the  wait  c,  move  and  clear  rules  and  the  limited 
difference  rule  approach  the  optimum  as  the  number  of  bits  they  use 
increases.  The  important  question  is:  which  converges  more  quickly? 
Table  2.9.1  shows  the  limited  difference  rule  makes  "better  use"  of 
its  bits. 

This  can  also  be  demonstrated  in  the  case  of  a  list  of  two 

elements  (A  and  B)  having  probabilities  a  and  b  (=l-a).  Here  the 

c    •      • 
probability  that  A  is  ahead  of  B  equals  I  (CV)   ac+V.  Table  2.8.1 

i=0  n 
shows  this  probability  approaches  one  much  more  slowly  than  that  of  the 

limited  difference  rule. 

A  major  disadvantage  of  the  wait  c,  move  and  clear  rules  is 
that  they  decrease  the  cost  more  slowly  than  the  corresponding  per- 
mutation rule  with  modified  probabilities,  since  a  counter  must  exceed 
c  for  a  move  to  be  done.  The  worst  case  occurs  when  every  key  is 
requested  c  times  before  any  key  is  requested  c+1  times.  In  this  case, 
a  move  will  be  done  every  cn+1  requests.  Thus,  the  convergence  can  be 
slowed  by  a  factor  fi(n).  On  the  other  hand,  the  best  case  occurs  when 
the  same  key  is  requested  c+1  times.  Here,  a  move  will  be  made  every 
c+1  requests  and  the  convergence  must  be  slowed  by  at  least  this 
constant  multiple. 


63 


Table  2.8.1 
Probability  A  1s  ahead  of  B  for  a=.6 


c 

LIMITED  BIT  «ULE 

0 

0*60000 

1 

0*66316 

2 

0*74218 

3 

0*81039 

4 

0*66446 

5 

0*90511 

6 

0*93457 

7 

0*95536 

8 

0*96977 

9 

0*97963 

10 

0*98632 

11 

0*99084 

12 

0*99387 

13 

0*99591 

14 

0*99727 

15 

0*99818 

16 

0.99878 

17 

0.99919 

18 

0.99946 

19 

0.99964 

20 

0. 99576 

21 

0.99984 

22 

0.99589 

23 

0.99593 

24 

0.99995 

25 

0.99997 

26 

0.99598 

27 

0.99999 

28 

0.95599 

29 

0.59999 

30 

1.00000 

31 

1.00000 

32 

1 .00000 

33 

1.00000 

34 

1.00000 

35 

1.00000 

36 

1 .00000 

37 

l.COOOO 

38 

1.00000 

39 

1.00000 

40 

1.00000 

41 

1 .00000 

42 

1.00000 

43 

1.00000 

44 

1.00000 

45 

1.00000 

46 

l.COCOO 

47 

1.00000 

48 

1  .00000 

49 

l.COOOO 

50 

1*00000 

WAIT  C  RULE 


0 

•60000 

0 

,64800 

0 

•68256 

0 

•71021 

0 

•73343 

0 

•75350 

0 

.77116 

0 

,78690 

0 

,80106 

0 

.81391 

0 

,82562 

0, 

,83636 

0 

,84623 

0 

,85535 

0, 

,86379 

0 

,87162 

0, 

,87890 

0. 

,88569 

0, 

,89202 

0, 

,89794 

0, 

,90348 

0. 

,90868 

0, 

,91355 

0, 

.91812 

0< 

►  92242 

0, 

,92647 

0, 

,9  30  28 

0, 

,93387 

0, 

,937  25 

0, 

,94045 

0. 

,94346 

0, 

,94631 

0. 

,94900 

0, 

,95154 

0, 

,95395 

0. 

,95623 

0. 

,95838 

0, 

,96042 

0. 

,96236 

0, 

,96419 

o, 

96593 

0. 

,96757 

0, 

,96914 

o. 

,97062 

0« 

97203 

0, 

,97336 

o. 

97463 

0« 

97584 

0, 

,97698 

0. 

97607 

0* 

97910 
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To  get  an  idea  of  the  average  decrease  1n  convergence,  we 
consider  n  equally  likely  keys  and  c=l.  Note  that  this  1s  the  least 
favorable  key  distribution.  We  now  determine  the  expected  number  of 
requests  before  a  key  is  requested  for  a  second  time.  This  is 

n 

I   Prob(no  key  has  been  requested  twice  after  i  requests). 
1*0 

This  probability  equals  the  number  of  sequences  of  length  i 

of  distinct  keys  ( /  "'•  \ , )  divided  by  the  total  number  of  sequences  (n  ) 

?   n!    1 
"i^O^^  7 

Replacing  i  by  n-i  gives 


-nen 


nl  I    V~  =n!n-n[  ^    <  n!n 
i=0   n'         i=0  '' 


Stirling's  approximation  gives 

3  (nne"n  SZm)  n"nen  =  SZ™. 

Therefore,  for  c=l ,  the  expected  slowdown  is  ft(/n),  for  this  unfavorable 
key  distribution. 

The  wait  c,  move  and  clear  rules  have  an  interesting  cor- 
respondence with  the  permutation  rules.  They  perform  better  than  per- 
mutation rules  because  they  are  more  selective  about  which  keys  are  moved. 
However,  the  performance  is  not  as  good  as  the  limited  difference  rule. 
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These  rules  have  the  further  disadvantage  of  converging  more  slowly 
than  permutation  rules.  For  a  list  of  n  elements,  the  convergence  is 
slowed  by  a  factor  between  c+1  and  nc+1  times.  If  c=l,  the  average 
slowdown  is  ~/27rn  for  the  uniform  distribution. 

2.9  Wait  c  and  Move  Rules 

We  now  turn  our  attention  to  the  wait  c  and  move  rules,  and 
first  consider  the  wait  c  and  move  to  front  rule. 

Theorem:  Given  key  probabilities  p,,p2,...,p  ,  the  asymptotic  cost  of 
the  wait  c  and  move  to  front  rule  is 


1  +  I  Pt  x.., 


where 


Pa  c  P-:      k  c     mxb       P-;       m 

J1       (Pi+Pi)(c-H)2  k=0  h+Pj    m=0    m     Vpj 

(the  probability  k.  is  ahead  of  k.   in  the  list). 

Proof:  Recall  that  the  expected  cost  is 

1  +  I  p.   Prob(k.  ahead  of  k.) 
W  J 

and  therefore  we  must  determine  this  probability.  Consider  any  two  keys, 
A  and  B,  having  probabilities  a  and  b.  Note  that  the  relative  ordering 
of  A  and  B  will  not  be  effected  when  another  key  is  moved.  Also,  their 
counts  will  remain  the  same  since  they  are  not  cleared.  Therefore,  in 
determining  Prob(A  ahead  of  B),  we  can  ignore  all  other  keys  and  requests 
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to  all  other  keys;  we  need  only  consider  a  11st  consisting  of  A  and  B, 
having  probabilities  -nr  and  -r  of  being  requested.  (For  simplicity, 

we  rename  these  probabilities  "a"  and  "b"). 

2 

This  list  can  be  modeled  by  a  Markov  chain  with  2 (c+1) 

states,  A.,  and  B. .  for  0  *  i,  j  <;  c.  State  A,,  corresponds  to  the 
list  with  A  (having  count  i)  ahead  of  B  (having  count  j).  State  B. . 
corresponds  to  B  (having  count  j)  ahead  of  A.  Note  that  the  first  sub- 
script is  always  A's  count. 

Before  solving  for  the  stationary  distribution,  we  must  first 
make  sure  it  will  give  us  Prob(A  ahead  of  B).  There  are  two  possible 
troubles.  First,  as  with  the  wait  c,  move  and  clear  rule,  we  must  wait 
in  each  state  of  the  two  element  chain  while  keys  other  than  A  and  B  are 
being  requested.  However,  since  key  requests  do  not  depend  on  whether 
A  is  ahead  of  B,  or  the  count  of  either  key,  the  requests  are  independent 
of  the  state  and  hence  the  expected  waiting  time  is  the  same  for  each 
state. 

Second,  the  chain  is  periodic  with  period  c+1.  If  we  let  r. 
and  r„  be  the  number  of  times  A  and  B  have  been  requested,  we  have 
i  =  r.  mod(c+l)  and  j  =  r„  mod(c+l).  Therefore  i+j  =  (r.  +  rg)  mod(c+l). 
Since  each  transition  increases  r.  +  rD  by  one,  if  we  start  at  A.,  (or 

A      D  I J 

B..),  it  will  always  take  a  multiple  of  (c+1)  transition  to  return. 
Hence  the  chain  has  period  c+1. 

A  chain  which  is  periodic  does  not  converge  to  its  steady 
state  distribution  in  the  sense  that  .™  pt^x0'x^  =  ^x^»  where  Pt(xQ,x) 
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is  the  probability  of  going  from  an  initial  state  xQ  to  state  x  in 
t  transitions,  and  p(x)  is  the  steady  state  probability  of  state  x. 
However,  for  an  irreducible  chain  (which  this  one  is),  the  ergodic 

1  ,im  1 

theorem  holds  (see  appendix).  This  states  that  ._>oo  j    I    P+^xo»x^  = 

p(x).  Hence  the  "time  average"  of  the  probability  approaches  the 
steady  state  distribution.  If  C(t)  is  the  expected  cost  at  time  t, 

we  are  guaranteed   ™  T    J  C(i)  =  T  p(x)c(x)  where  c(x)  is  the  cost  of 
z      l   i=0      x 

state  x.  The  cost  converges  to  the  asymptotic  cost  in  this  sense. 
Note  that  the  asymptotic  cost  is  still  the  stationary  probability  of  a 
state  times  its  cost  summed  over  all  states,  only  the  strength  of  con- 
vergence has  been  changed. 

We  now  proceed  to  determine  the  stationary  probability.  The 
steady  state  equations  are: 


AiraVi,j  +  bVj-i 


Bij=aB1-l,j+bBi,j-l  for0<1,j*c 


A0j  =  bA0>j.1  ♦  aAc.  +  aBcj 


Bq.  =  bB0  .  1  for  0  <  j  <;  c 


Ai0  "  aAi-l,0 


Bi0  =  aB1-l.0  +  bAic  +  bB1c        forO<1«i 


l'<i 


A00  =  aAcO  +  aBcO 


B00  =  bA0c  +  bB0c 


1 


By  adding  pairs  of  equations,  we  can  verify  A..  +  B, .  = 
,  , v2  .  This  corresponds  to  the  fairly  obvious  fact  that  asymptotically 
every  pair  of  counts  (without  regard  to  the  order  of  the  list)  is 
equally  likely. 

Substituting  this  relation  gives 


Aij  =aAi-l,j  +bA1,j-l  forO<i,j<c 

An-i  =  bAn  ,•  1  +  — — 1  for  0  <  j  (  c 


'Oj       u"0,j-        ^7 


AiO=aAi-l,0  forO<i«c 


A 


00       (c+1)2 


This  is  equivalent  to  the  system 


A.  .  =  aA.   ,    .  +  bA.    .   ,  for  0  £  i , j  £  c 


for  0  £  j  *  c 


for  0  *  i  *  c 


For  convenience,  extend  these  recurrences  to  hold  for  all  i , j  ^  0. 
This  will  not  effect  the  A.,  we  are  interested  in.  The  recurrence  can 
now  be  solved  by  the  use  of  generating  functions.  Define 


A-l 

,j 

_     1 

(c+1)2 

Ai, 

-1 

=  0 
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00  00  .         , 


F(x,y)  =     I      I    A.-xV 
1=0  j=0     1J 

=     I      I  (aAj   ,    i  +  bA,   ,  .)xV 
i=0  i=o       '    l,J  1,J    ■ 


00  00 


=  a  I      I  A,,   ,-xV  +  b  I      £  A,   ,  lXy 
i=0  j=0  n    ,,J  i=0  j=0  1,J 

00  00  .  00  .  00  00  . 

=  ax  I      I  A.-xV  +  a  I  A  ,  ,yJ  +  b  I      I  A-.xV 
i=0  j=0  1J  j=0     l,J  1=0  j=0  1J 


=  ax  F(x,y)  + % +     by  F(x,y) 


(c+l)T(l-y) 


Solving  for  F(x,y)  gives 


F(x-y)  =  (,..X  ..)(i^bb7> 


(c+l)'(l-y)    *~by' 


00     .,     00 


=  -L-T  (  Iy1)(  I  (ax+by)J) 
(c+ir  1=0    j=0 

Using  the  binomial  theorem  gives 

j-k    k 


7(  Iy1)(.I  I   (Jk)(ax)   (by)  ) 


(c+l)fc  1=0   j=0  k=0  k' 

=  -^"T  I   I   i  (Jk)a^kby-kyi+k 
(c+1)  1=0  j=0  k=0  K 

Now  substitute  i'  for  j-k  and  j'  for  i+k  and  then  drop  the  primes 

OO      00      J 

■  -J-7    I      I      I  (1kk)a1bkxV 
(c+lr  1=0  j=0  k=0     K 

Therefore        A..  =      a    *     \  (itk)aibk 
1J       (c+ir  k=0    k 
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c       c 
Prob(A  ahead  of  B)   is  then     I       I  A . . 

i=0  j=0  1J 

=  -S-7    I      I      I  C1tk)a1bk 

(c+lT  1=0  j=0  k=0     K 

=  -J-T    I       I  (1tk)aibk  I  1 
(c+1)     k=0  i=0     K  j=k 


a  r  /     mukS  ,1+k%_  1 


7    I  (c-k+l)b*  [  (':R)a 
(c+ir  k=0  1-0     k 

Recalling  that  a  and  b  were  originally  -Jt-  and  — nr  and 
substituting  into  the  cost  formula  finishes  the  proof. 

Another  interesting  fact  about  this  rule  is  that  for  some 
distributions  we  can  prove  that  it  does  not  approach  the  optimum  as 
we  increase  the  number  of  bits. 

Theorem:  Given  a  distribution  of  key  request  probabilities,  if 

p.  <  p.  <  2p.  for  some  i  and  j,  the  wait  c  and  move  to  front  rule  will 

not  approach  the  optimum  as  c  -►  ». 


Proof:  We  show  that  Prob(k.  ahead  of  k.)  does  not  approach  1  as  c  *  «, 

hence  the  cost  is  bounded  away  from  the  optimum. 

p.  p. 

For  convenience,  let  a  =  — | —  and  b  =  — J— -  .  From  the 

pi+pj         pi+pj 

preceeding  theorem,  Prob  (k.  ahead  of  k.)  = 

J 

k  r  ,i+kx  i 


-i-7  I  (c-k+l)bK  I  (':K)a 
(c+1)  k=0       i=0  K 

c       .  °°  ... 
<-JL-T  I   (c-k+l)bk  I  C*)*'1 
(c+iy   k=0       i=0  K 
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a    ?  ,_  ..-.xuk 


7  I  (c-k+l)bR ^ 


(c+ir  k=0       (1-a)' 
Since  1-a  =  b, 

=  — *—*    I   (c-k+1) 
b(c+l)  k=0 

=  -^T[(c+l)2-^-] 

b(c+ir 

a   c+2 


b   2c+2 

p .  , . 

which  approaches  It-  =  -s—  as  c  ■>  ».  since  p.  <  2p.,  „  n  Prob(k,  ahead 
„       2b   2p .  ri    rj'  c-*>°    v  l 

"l           ^                              I — I 
k.)  =  j^~  K   1  anc^  tne  cost  cannot  approach  the  optimum  as  c  ■+  ».  I I 

J 

Indeed,  it  is  reasonable  to  expect  the  theorem  to  hold  for 
all  distributions  except  the  uniform  and  the  distribution  with  a  key  of 
probability  one.  However,  this  conjecture  has  not  yet  been  proved. 

It  is  interesting  to  determine  why  this  method  decreases  the 
cost  over  the  move  to  front  rule.  The  wait  c,  move  and  clear  rule 
achieved  a  decrease  by  altering  the  probability  that  a  key  is  moved  from 
the  request  probabilities  to  a  more  favorable  distribution.  However, 

the  wait  c  and  move  rule  does  not  do  this.  Since  a  key  is  moved  after 

st 
eyery   (c+1)   request  for  it,  the  move  probabilities  remain  unchanged 

in  the  sense  that  a  key  requested  with  probability  p.  will  account  for 

a  fraction,  equal  to  p.,  of  the  total  number  of  moves. 

Consider  any  two  keys,  k.  and  k..  If  we  assume  that  moves 

occur  at  intervals  which  are  independent  of  whether  or  not  k.  is  ahead  of 

pi 
k.,  then  k.  will  be  ahead   '   of  the  time  and  the  performance  will  be 

vJ 
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the  same  as  the  move  to  front  rule.  However,  this  1s  not  the  case. 
After  k.  has  been  moved  (assume  p,  >  p.),  its  count  1s  set  to  zero. 
Asymptotically,  k.'s  count  is  uniformly  distributed  over  {0,1, ...,c}. 
After  k.  has  been  moved,  its  count  is  zero,  and  k.'s  count  ranges  from 
zero  to  c.  Clearly,  after  k.  has  been  moved,  the  next  move  will  occur 
sooner;  the  roles  of  k.  and  k.  have  been  interchanged,  and  after  k. 
has  been  moved,  the  count  of  k.  (the  more  probable  key)  1s  closer  to 
causing  a  move.  Therefore,  the  probability  we  find  k.  ahead  of  k.  is 
increased  because  we  must  wait  longer  for  the  next  move  when  it  is 
ahead  of  k.. 

w 

Finally,  we  notice  that  this  rule  will  have  much  faster  con- 
vergence than  the  wait  c,  move  and  clear  rule  since  on  the  average,  it 
will  move  a  key  after  ewery   c+1  requests.  The  performance  of  this 
rule  is  compared  with  previous  rules  in  Table  2.9.1. 

Having  analyzed  these  rules,  we  can  see  that  they  are  asympto- 
tically inferior  to  both  the  wait  c,  move  and  clear  rules  and  the 
limited  difference  rule.  The  convergence  is  faster  than  the  wait  c, 
move  and  clear  rules.  It  is  at  most  c+1  times  slower  than  the  cor- 
responding permutation  rule,  while  the  wait  c,  move  and  clear  rule  may 
be  as  bad  as  nc+1 .  A  final  interesting  fact  is  that  for  some  probability 
distributions,  it  can  be  proved  that  this  rule  does  not  approach  the 
optimum  as  c  -*■  ».  We  conjecture  this  to  hold  for  any  probability 
distribution  except  the  uniform  and  the  distribution  with  a  key  of 
probability  one. 
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Table  2.9.1 
Comparison  of  Rules  that  use  Counters 


c=0    C=l     c=2     c=3     c=4    c=5 


Limited  Difference 

Rule  3.9739  3.4162  3.3026  3.2545  3.2288  3.2113 

Wait  c,  Move  to  Front 

and  Clear  (Exact)       3.9739  3.6230  3.4668  3.3811  3.3285 

Wait  c,  Transpose  and 

Clear  (Exact)  3.4646  3.3399  3.2929  3.2670  3.2501 

Wait  c  and  Move  to 

Front  (Exact)  3.9739  3.8996  3.8591  3.8338  3.8165  3.8040 

Wait  c  and  Transpose    3.4646  3.3824  3.3576  3.3473  3.3312  3.3272 


Asymptotic  costs  for  various  rules  assuming  a  nine 
element  list  whose  probabilities  are  given  by  Zipf 's 
Law.  Compare  these  with  the  optimal  cost  which  is 
3.1814.  Cost  for  the  limited  difference  rule  and 
the  wait  c  and  transpose  rule  were  estimated  by 
simulations  consisting  of  1000  requests.  The  average 
of  200  trials  is  shown. 
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2.10  Time  Varying  Distributions 

In  this  section,  we  consider  probability  distributions  that 
vary  with  respect  to  time.  We  first  examine  two  examples  concerning 
the  move  to  front  rule:  one  where  the  probability  of  a  key  decreases 
after  it  has  been  requested,  and  another  where  it  increases. 

The  first  example  supposes  we  have  n  keys,  k,,k2,...,k  . 
Assume  the  requests  made  to  this  list  form  a  sequence  of  permutations 
of  these  n  keys.  The  permutations  are  independently  chosen  with  each 
of  the  n!  permutations  being  equally  likely.  A  model  that  satisfies 
this  constraint  is  a  company  that  sends  out  bills  each  month.  Its 
customers  then  pay  their  bills  in  a  random  order. 

Assuming  this  model,  we  can  prove  that  the  move  to  back  rule 
is  the  optimal  rule.  The  proof  is  as  follows:  after  t  requests  out 
of  a  permutation  have  been  made,  each  of  the  remaining  n-t  requests  is 
equally  likely,  and  the  best  we  can  do  is  to  have  these  n-t  keys  (and 
none  of  the  t  previously  requested  keys)  in  the  first  n-t  positions  of 
the  list.  Since  each  key  is  equally  likely  to  be  requested,  the 
ordering  of  the  unrequested  keys  will  make  no  difference.  The  move  to 
back  rule  clearly  achieves  this  and  therefore  must  be  optimal.  Any 
other  rule  will  occasionally  move  the  requested  key  to  one  of  the  first 
n-t  positions,  resulting  in  a  higher  cost.  To  derive  the  average  cost 
for  the  move  to  back  rule  to  retrieve  all  n  keys  of  a  parmutation,  we 

J.L. 

note  that  to  retrieve  the  i   key,  we  search  through  an  unordered  list 
of  n-i+1  keys.  The  average  cost  is  then 
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r/rnc+   \  ■  1  ■?  (n-1-H)  +  1  _  n+3 
E(C0StMTB}  "  n  fa — ~T r 

If  no  rule  1s  applied  to  the  list,  each  key  will  be  accessed  exactly 
once  giving  a  cost  of 

E(CostDflMn)  -  I  I   i  -  -' 


RAND'   n  fa*         2  ' 

Finally  if  the  move  to  front  rule  is  used,  accessing  the  list  at  time 

i  will  first  require  1-1  comparisons  with  the  previously  requested 

keys.  Then  we  search  through  an  unordered  list  of  n-1+1  keys.  The  cost 
is  then 

nrnct  \   .  1  r  m  i\  *.   (n-1+1)  +  1   3n+l 
E(CostMTF)  --^(1-1)  +.J    ji 5-. 

This  cost  is  three  times  larger  than  the  move  to  back  rule,  and  50 
percent  larger  than  doing  no  moves  at  all.  The  reason  is  obvious: 
once  a  key  has  been  requested,  its  probability  of  being  requested  again 
decreases.  In  this  case,  our  strategy  must  be  to  move  requested  keys 
back  in  the  list. 

Using  the  move  to  back  rule,  the  keys  will  appear  1n  the  list 
in  the  order  that  they  were  requested.  If  our  clients  have  regular 
habits  and  pay  their  bills  at  about  the  same  time  each  month,  the  access 
time  of  the  move  to  back  rule  will  decrease  further,  and  that  of  the 
move  to  front  rule  will  increase. 

We  now  consider  a  second  example.  Suppose  that  with  probabil- 
ity p,  the  requested  key  is  the  same  as  the  previously  requested  key. 
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With  probability  1-p  some  distribution  (p,  ,p2>. . . ,p  )  over  the  keys  1s 
used.  The  move  to  front  rule  would  seem  to  be  the  logical  choice  here 
(if  p  is  not  extremely  small)  since  the  first  key  in  the  list  will 
have  a  good  chance  of  being  requested  again. 

To  analyze  this  rule,  we  note  that  the  probability  of  a  given 
ordering  is  not  effected  by  p,  but  depends  only  on  the  p..  We  can 
view  the  chain  as  waiting  in  each  state  until  a  "normal"  request  is 
made.  During  this  wait,  only  requests  to  the  first  key  are  made,  and 
these  do  not  change  the  order  of  the  list.  In  addition,  the  wait  time 
is  the  same  for  all  states. 

Therefore,  with  probability  p,  the  first  key  is  found  in  one 

comparison.  With  probability  1-p,  the  p,,p2,...,p  distribution  is 

p.p. 
used.  The  cost  here  is  just  1+2    £    F~+d  '  t'le  norma^  move  t0 

Ui<j*n     pi  pj 

front  cost.     Adding  these  two  results  gives 

p  •   1  +  (1-p)   -   [1  +  2         I        -J5L] 

U1<j<n  K1  Fj 

-1+20-P)     J.      jSJ£     - 

a  decrease  of  nearly  a  factor  of  1-p.  If  p  is  close  to  one,  the  move  to 
front  rule  gives  \/ery   good  performance. 

These  two  examples  point  out  much  of  the  performance  depends  on 
the  model  for  the  input  requests.  For  models  that  cause  the  requested  key 
to  become  more  probable,  the  move  to  front  rule  will  perform  well.  If 
the  requested  key  becomes  less  probable,  a  rule  that  moves  the  requested 
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key  back  in  the  11st  1s  more  suitable.  Due  to  the  wide  variety  of 
performance,  little  more  can  be  said  unless  a  specific  model  is  con- 
sidered. 

2.11  Summary  and  Conclusion 

We  first  discussed  the  move  to  front  rule  and  the  transposition 
rule.  Asymptotically,  the  transposition  rule  performs  better.  Rivest 
[2]  has  shown  that  for  any  distribution  its  asymptotic  cost  1s  less  than 
or  equal  to  that  of  the  move  to  front  rule.  We  calculated  the  asymptotic 
cost  for  several  distributions,  with  the  transposition  rule  showing  about  a 
10  percent  increase  over  the  optimum,  and  the  move  to  front  rule  a  25-38 
percent  increase.  Finally,  a  theorem  by  Rivest  [2]  showed  that  these  costs 
could  be  at  most  twice  the  optimum.  Thus,  1f  we  expect  the  number  of 
requests  to  be  large  compared  to  the  number  of  keys,  the  asymptotic  cost 
will  dominate  and  the  transposition  rule  will  be  superior. 

Asymptotic  cost  is  not  the  only  criterion  for  evaluating  rules. 
A  rule  may  have  very  low  asymptotic  cost,  but  converge  so  slowly  that  it 
is  of  little  practical  value.  We  defined  the  overwork  in  order  to 
measure  the  speed  of  convergence.  The  move  to  front  rule  was  found  to 
have  much  smaller  overwork.  For  two  simple  distributions,  the  move  to 

ft     1  n   1 

front  overwork  was  -i-  (for  an  n  element  list),  compared  to  — ^-  for  the 

transposition  rule.  For  Zipf 's  Law,  the  move  to  front  rule  has  overwork 

2  3 

~  .057n  ,  while  the  transposition  rule  has  fi(n  )  overwork. 

The  difference  in  rates  of  convergence  was  also  demonstrated 

by  graphs  of  the  time  varying  cost.  From  these,  we  calculated  when  the 
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total  cost  of  the  transposition  rule  would  become  less  than  that  of  the 

2 
move  to  front  rule.  For  Zipf's  Law,  it  appears  to  take  U[n   )  requests 

before  the  crossover  occurs.  Thus,  if  the  number  of  requests  will  be 

small  (0(n  )),  compared  to  the  number  of  keys,  the  move  to  the  front 

rule  outperforms  the  transposition  rule. 

We  next  considered  a  subset  of  permutation  rules  called 
move  ahead  k  rules.  These  rules  form  a  spectrum  ranging  from  the  move 
ahead  1  rule  (transposition)  to  the  move  ahead  n-1  rule  (move  to  front). 
As  the  parameter  k  is  increased,  the  asymptotic  cost  increases,  but  the 
rate  of  convergence  also  increases. 

A  hybrid  rule  that  initially  moves  keys  to  the  front  and  then 
begins  transposing  was  also  examined.  If  the  anticipated  number  of 
requests  is  neither  large  enough  to  make  the  transposition  rule  a  clear 
choice  nor  small  enough  to  require  the  move  to  front  rule,  the  hybrid 
rule  should  be  used.  It  was  shown  to  combine  the  best  features  of  both 
rules:  initially  it  converges  quickly,  and  asymptotically  it  has  a  low 
cost.  Note  that  it  is  only  in  this  intermediate  region  that  both  fast 
convergence  and  low  asymptotic  cost  are  important.  Outside  of  this 
region,  the  hybrid  either  performs  like  either  the  move  to  front  rule 
or  the  transposition  rule,  and  it  is  better  to  use  these  rules  than 
incur  the  overhead  of  the  hybrid.  A  difficulty  was  found  in  deciding 
when  to  switch  rules.  For  Zipf's  Law,  this  point  appears  to  be  .268n  + 
.980. 

The  first  request  rule  and  the  move  to  front  rule  was  shown  to 
produce  any  list  with  the  same  probability.  Thus,  these  two  rules  are 
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essentially  the  same,  but  there  are  two  Important  differences.  First, 
the  first  request  rule  moves  each  key  only  once  and  therefore  1s  much 
cheaper  to  execute  than  the  move  to  front  rule.  Second,  the  first 
request  rule  will  be  "trapped"  in  some  ordering  after  all  keys  have  been 
requested.  Thus,  its  cost,  averaged  over  time,  has  a  much  higher 
variance  than  that  of  the  move  to  front  rule.  Thus,  this  rule  can  be 
used  in  place  of  the  move  to  front  rule.  The  advantage  here  1s  that  the 
first  request  rule  is  cheaper  to  execute  (each  key  1s  moved  only  once). 
However,  it  also  has  the  drawback  of  increasing  this  variance. 

The  asymptotic  ordering  obtained  by  the  first  request  rule  1s 
the  same  as  if  the  keys  were  not  known  a  priori  and  a  "new"  key  (one 
requested  for  the  first  time)  1s  inserted  at  the  end  of  the  list. 
Since  this  is  also  the  steady  state  distribution  for  the  move  to  front 
rule,  if  the  initial  list  was  constructed  in  this  manner,  the  move  to 
front  rule  will  not  reduce  the  cost. 

Finally,  a  hybrid  rule  between  the  first  request  and  trans- 
position rules  was  formulated.  This  has  better  performance  than  the 
move  to  front/ transposition  hybrid,  and  also,  we  need  not  guess  when  to 
switch  rules. 

In  comparing  the  different  rules  we  have  studied,  we  find 
that  the  move  to  front  rule  is  best  if  a  small  number  of  requests  will 
be  made.  The  transposition  rule  is  best  for  a  large  number  of  requests, 
and  the  first  request  rule/transposition  hybrid  should  be  used  for  an 
intermediate  number.  However,  none  of  these  rules  should  be  used  if 
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we  have  storage  to  keep  counters.  If  this  1s  the  case,  one  of  the 
following  methods  should  be  used. 

We  next  considered  rules  that  used  counters.  The  first  of 
these  was  the  frequency  count  rule.  The  performance  of  this  rule  is 
optimal  at  any  time  and  asymptotically.  Its  major  disadvantage  is 
the  storage  required  by  the  count  fields.  This  can  be  reduced  by 
storing  the  differences  between  successive  counts. 

The  limited  difference  rule  put  an  upper  bound  on  the  size  of 
these  differences.  The  performance  is  no  longer  optimal,  but  approaches 
the  optimum  as  the  upper  bound  on  the  differences  goes  to  infinity. 
This  upper  bound  need  not  be  too  large.  Even  for  small  bounds,  the 
performance  is  nearly  optimal. 

The  wait  c,  move  and  clear  rules  improve  on  the  performance  of 
the  corresponding  permutation  rule  by  altering  the  probability  that  a 
key  is  moved.  As  c  ■*  »,  the  performance  approaches  the  optimum.  How- 
ever, the  performance  of  the  limited  difference  rule  is  better  and 
these  rules  also  have  the  disadvantage  of  converging  wery   slowly. 

A  final  class  of  rules  was  the  wait  c  and  move  rules.  These 
rules  also  improved  upon  their  corresponding  permutation  rule,  but  the 
cost  does  not  approach  the  optimum  as  c  +  »,  and  these  rules  were 
outperformed  by  both  the  limited  difference  rule  and  the  wait  c,  move  and 
clear  rules. 

In  comparing  the  different  rules  using  counters,  the  frequency 
count  rule  should  be  the  choice  if  enough  storage  can  be  spared  for  the 
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counters  so  that  there  1s  no  possibility  of  overflow.  If  this  1s  not 
the  case,  the  limited  difference  rule  appears  to  be  the  best.  Its 
asymptotic  cost  1s  the  lowest,  and  1t  does  not  have  the  slow  con- 
vergence of  either  the  "wait"  rules. 

Finally,  we  considered  the  effects  of  a  time  varying  distri- 
bution. In  one  example,  the  move  to  back  rule  1s  optimal,  and  the  move 
to  front  rule  1s  poorer  than  simply  leaving  the  11st  unchanged.  In 
another  example,  the  move  to  front  rule  performed  quite  well.  The  dif- 
ference is  that  once  a  key  has  been  requested  in  the  first  example,  its 
probability  of  being  requested  decreases.  In  the  second  example,  it 
increases. 
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3.   BINARY  SEARCH  TREES 


In  this  chapter,  we  will  discuss  extensions  of  the  previously 

discussed  techniques  to  binary  search  trees.  The  standard  definitions 

given  in  Knuth  [3]  will  be  used.  Note  that  the  cost  function  is  still 
n 
7  p.c,  but  now  c.  is  the  level  of  k.  (with  the  root  having  level  one.) 

Here  it  is  necessary  to  assume  that  there  is  an  ordering  imposed  on  the 
keys.  The  tree  search  algorithm  requires  that  every  node  in  the  tree 
must  be  greater  than  every  node  in  its  left  subtree  and  less  than  every 
node  in  its  right  subtree.  Any  transformation  we  perform  on  the  tree 
must  preserve  this  property. 

Results  that  are  related  to  this  topic  fall  into  two  categories 
The  first  assumes  that  key  request  probabilities  are  known  a  priori.  If 
this  is  the  case,  an  algorithm  by  Knuth  [11]  can  be  used  to  determine 
the  optimal  binary  tree.  Heuristic  algorithms  that  build  near-optimal 
trees,  but  require  less  space  and  time  have  been  discovered  by  Bruno 
and  Coffman  [12],  Melhorne  [13],  and  Walker  and  Gottlieb  [14]. 

The  second  category  of  results  contains  the  height  balanced 
trees  of  Adelson-Velskii  and  Landis  [15],  and  the  weight  balanced  trees 
of  Nievergelt  and  Reingold  [16].  These  methods  balance  the  tree  when 
keys  are  inserted  and  deleted.  No  changes  are  made  if  a  key  already  in 
the  tree  is  requested,  so  these  methods  do  not  take  advantage  of  a 
favorable  probability  distribution  by  moving  more  frequently  accessed 
keys  nearer  the  root.  The  methods  we  will  examine  do  not  fall  into 
either  category  since  we  assume  the  probabilities  are  not  known  a  priori, 
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and  we  will  suppose  that  there  are  high  probability  keys  which  we  wish 
to  move  near  the  root  of  the  tree. 

We  will  study  the  two  transformations  shown  below,  called 
"rotations." 


Figure  3.  The  two  rotations. 

Here  the  circles  labeled  A  and  B  are  nodes,  and  the  triangles  labeled 
S, ,  S2  and  S3  are  subtrees.  Note  that  this  transformation  can  be  per- 
formed at  any  node  in  the  tree.  This  pair  of  transformations  is 
acceptable  since  1t  does  preserve  the  ordering  of  the  tree;  after  a 
rotation,  the  left  subtree  of  A  contains  only  keys  less  than  A,  and  the 
right  subtree  contains  only  those  greater,  and  similarly  for  B.  This 
pair  of  transformations  is  also  "complete"  in  the  following  sense: 

Theorem:  Let  T,  and  T2  be  two  binary  search  trees  that  have  the  same  set 
of  keys.  Then  T,  can  be  transformed  into  T«  by  a  sequence  of  rotations. 

Proof:  Let  T,  be  the  root  of  T«.  We  can  bring  r  to  the  root  of  T-.  by 
using  rotations  to  successively  promote  it  until  it  reaches  the  root. 
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Since  rotations  preserve  the  ordered  property  of  the  tree,  the  nodes  in 
r's  left  subtree  will  be  less  than  r,  and  the  nodes  in  its  right  subtree 
will  be  greater.  We  then  recursively  apply  this  procedure  to  each  sub- 
tree to  generate  first  the  left  and  then  the  right  subtrees  of  T?.  Note 
that  the  transformations  applied  to  the  right  subtree  will  leave  the 
left  subtree  unchanged.  Hence,  this  procedure  successfully  produces  T~.  LJ 

We  consider  these  transformations  as  a  mechanism  to  move  node 
A  and  subtree  S,  to  higher  positions  in  the  tree  when  we  suspect  they 
contain  high-probability  nodes.  The  important  question  is:  when  should 
the  transformation  be  used?  The  following  sections  describe  several 
different  rules  for  using  the  transformations. 

3.1  Transform  after  Every  Request 

We  first  consider  rules  analogous  to  the  transposition  and  move 
to  front  heuristics.  The  move  to  root  rule  uses  rotations  to  repeatedly 
promote  the  requested  node  until  it  becomes  the  root  of  the  tree.  The 
move  up  one  rule  uses  a  rotation  to  promote  the  requested  node  one  level. 

Although  it  is  not  immediately  obvious,  the  operation  of 
moving  a  node  to  the  root  can  easily  be  done  during  the  search  for  the 
requested  key.  Suppose  x  has  been  requested  and  has  subtrees  S.  and  SR. 
Let  £,,...,£.  be  the  ancestors  of  x  which  are  less  than  x  (labeled  in 
order  of  distance  from  the  root)  and  suppose  they  have  left  subtrees 
S0  ,...,S0  .  Similarly,  let  r -,,..., r.  be  the  ancestors  greater  than  x 

X,-|         X,.  I         J 

with  right  subtrees  S  ,...,S  .  We  can  then  construct  a  tree  with  x  as 

rl     rj 
its  root  as  shown  below. 
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ancestors 

less  than 

x 


ancestors 
greater 
than  x 


Figure  3.1.1  Moving  node  x  to  the  root. 


To  accomplish  this  transformation  during  the  search,  we  simply 
keep  a  list  of  the  ancestors  of  x  which  are  less  than  x  using  their  right 
pointers  and  a  list  of  those  greater  than  x  using  their  left  pointers. 
When  x  is  found,  his  sons  become  1-,   and  r, ,  the  heads  of  these  two  lists. 
S.  becomes  the  right  subtree  of  £ . ,  and  SR  becomes  the  left  subtree  of 
r..  The  case  where  x  has  no  left  (or  right)  ancestors  is  easily  handled 

J 

by  a  few  tests. 

To  analyze  the  performance  of  the  move  to  root  rule,  we  intro- 
duce the  first  request  rule.  The  first  time  a  key  is  requested,  it  is 
promoted  in  the  tree  until  it  reaches  the  root  or  becomes  the  son  of 
previously  requested  key.  The  resulting  tree  is  the  same  as  the  one 
obtained  by  inserting  a  "new"  key  (one  requested  for  the  first  time)  into 
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the  tree.  As  in  the  case  of  the  linked  list,  the  move  to  root  rule 
and  the  first  request  rule  behave  identically.  To  prove  this,  we  first 
make  three  observations. 

Observation  1 :  Consider  a  tree  that  is  modified  by  the  move  to  root 
rule.  For  any  two  keys,  x  and  y,  if  a  key  which  is  between  x  and  y  in 
the  ordering  on  the  keys  is  requested,  x  will  not  be  an  ancestor  of  y 
in  the  resulting  tree. 

Proof:  Since  the  root  of  the  resulting  tree  is  between  x  and  y,  they 
will  be  in  different  subtrees  of  the  root. 

Observation  2:  If  x  is  an  ancestor  of  y  and  a  key  that  is  not  between 
x  and  y  is  requested,  x  will  still  be  an  ancestor  of  y. 

Proof:  Suppose  that  z  has  been  requested.  If  z  is  not  a  descendant  of 
x,  the  tree  rooted  at  x  will  not  be  altered,  so  x  will  still  be  an 
ancestor  of  y.  If  z  is  a  descendant  of  x,  it  will  be  moved  up  in  the 
tree  until  it  becomes  a  son  of  x.  There  are  two  cases 

Case  (1):  z  <  x.  The  rotation  looks  like: 


■> 


Figure  3.1.2 
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where  y  is  in  either  of  the  shaded  subtrees.  Note  that  y  cannot  be  in 
the  left  most  subtree  since  then  z  would  be  between  x  and  y. 

Case  (2):  z  >  x.  The  transformation  is: 


^ 


Figure  3.1 .3 

where,  again  y  must  be  in  either  shaded  subtree. 

In  either  case,  x  is  still  an  ancestor  of  y.  Since  z  is  no 
longer  a  descendant  of  x,  any  further  rotations  will  leave  x  as  an 
ancestor  of  y,  and  therefore  Observation  2  must  be  true. 

Observation  3:  If  neither  x  nor  y  is  the  ancestor  of  the  other  and  a 
key  that  is  not  between  x  and  y  is  requested,  then  neither  x  nor  y  will 
become  the  ancestor  of  the  other. 

Proof:  If  neither  x  nor  y  is  the  ancestor  of  the  other,  there  exists 
some  w  that  is  between  x  and  y,  and  an  ancestor  of  both.  Since  z  is 
not  between  x  and  y,  it  cannot  be  between  x  and  w  and  it  cannot  be 
between  y  and  w.  By  Observation  2,  w  will  still  be  an  ancestor  of  both 
x  and  y  in  the  resulting  tree  and  hence  neither  x  nor  y  will  become  the 
ancestor  of  the  other.  [~| 
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We  now  give  a  lemma  that  characterizes  exactly  when  one  node 
will  be  an  ancestor  of  another,  based  on  the  sequence  of  requests  and 
the  initial  tree. 

Lemma  1  :  Node  x  will  be  an  ancestor  of  node  y  using  the  move  to  root 
rule  if  and  only  if: 

(1)  Neither  x,  nor  y,  nor  any  key  between  them  in  ordering 
on  the  keys  has  been  requested,  and  x  was  an  ancestor 
of  y  in  the  initial  tree. 
OR  (2)  Neither  y  nor  any  key  between  x  and  y  has  been  requested 
after  the  most  recent  request  for  x. 

Proof:  ("if"  part) 

(1)  =>  Lemma  follows  from  Observation  2. 

(2)  =>  Lemma  follows  from  Observation  2  and  the  fact  that 
when  x  is  requested,  it  becomes  the  root  of  the  tree  and 
hence  is  an  ancestor  of  every  other  node. 

("only  if"  part) 

Case  1  (x  has  not  been  requested) 

Suppose  that  x  has  not  been  requested,  and  it  is  an  ancestor 
of  y.  We  will  show  that  this  must  imply  (1).  From  the 
observations  it  is  clear  that  the  only  way  x  can  become  an 
ancestor  of  y  (if  it  is  not  already)  is  for  x  to  be  requested, 
Since  x  was  never  requested,  it  must  have  originally  been  an 
ancestor  of  y.  Then,  from  Observation  2,  no  key  between  x 
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and  y  can  have  been  requested  since  then  x  would  no  longer 

be  an  ancestor  of  y.  Similarly,  y  cannot  have  been  requested. 

Therefore  (1)  holds. 

Case  2  (x  has  been  requested) 

Here  we  show  (2)  must  hold.  Consider  the  situation  after 

the  most  recent  request  for  x:  x  is  an  ancestor  of  y,  and 

x  will  not  be  requested  again.  This  is  the  same  situation 

as  in  Case  1  and  by  using  its  proof,  we  can  show  (2)  must 

hold.  Q 

We  also  prove  the  following  lemma  about  the  first  request 
rule. 

Lemma  2:  Node  x  will  be  an  ancestor  of  node  y  using  the  first  request 
rule  if  and  only  if: 

(1)  Neither  x  nor  y  nor  any  node  between  them  has  been 

requested  and  x  was  an  ancestor  of  y  in  the  original 
tree. 

OR  (2)  Neither  y  nor  any  node  between  x  and  y  was  requested 
before  the  first  request  for  x. 

Proof:  Case  1  (x  has  not  been  requested.) 

First  note  that  the  three  observations  still  hold  if  x  has 
not  been  requested.  Then,  as  we  noted  before,  once  the 
requested  node  (z)  is  no  longer  a  descendant  of  x,  further  rota- 
tions involving  z  do  net  effect  the  tree  rooted  at  x.  Hence  the 
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two  rules  "look  the  same"  to  an  unrequested  x  because  the  only 
differences  occur  after  z  is  no  longer  a  descendant  of  x. 
Therefore  the  proof  for  Lemma  1  is  valid  and  Case  1  is  proved. 
Case  2  (x  has  been  requested) 

To  see  what  happens  when  x  is  first  requested,  consider  the 
previously  requested  keys  and  label  them  k,,kp,...,k  so  that 
k,  <  k2  <...<  k  .  They  occur  in  a  group  at  the  top  of  the 
tree  and  divide  the  unrequested  nodes  into  n+1  different  sub- 
trees. The  leftmost  of  these  subtrees  contains  all  keys  less 

than  k, ,  and  the  rightmost  contains  all  greater  than  k  .  Each 
1  n 

of  the  remaining  consist  of  all  keys  between  two  "adjacent" 
k. .  (See  Figure  3.1.4) 

Thus,  two  unrequested  nodes,  x  and  y,  are  in  the 
same  subtree  if  and  only  if  no  key  between  them  has  been 
requested.  When  x  is  first  requested,  it  moves  to  the  root 
of  the  subtree  it  is  in  and  becomes  an  ancestor  of  all  nodes 
in  that  subtree.  Therefore  x  becomes  the  ancestor  of  y  if 
and  only  if  neither  y  nor  any  key  between  x  and  y  has  been 
requested,  x  will  then  remain  the  ancestor  of  y  since  no  node 
can  move  up  past  x  and  out  of  its  subtree,  proving  Case  2.  I — I 

We  can  now  prove  the  main  theorem. 

Theorem:  Given  any  initial  tree,  the  probability  of  obtaining  a  given 
final  tree  after  any  number  of  requests  is  the  same  for  the  move  to  root 
rule  and  the  first  request  rule. 
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Keys  k,,k2,k3  and  k.  have  been  requested.  S,  contains 
all  keys  less  than  k, .  S5  contains  all  keys  greater 
than  k..  S.  contains  all  keys  between  k.  •,  and  k.  for 
1  =  2,3,4. 


Figure  3.1.4  How  the  requested  keys  divide  the  tree. 
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Proof:  Consider  any  sequence  of  requests  r,,r^,...r.  as  inputs  to  the 
move  to  root  rule  and  the  reversed  sequence  r^ .  , r*.  _ -. , . . .  ,r,  as  inputs  to 
the  first  request  rule.  Note  that  these  two  sequences  have  the  same 
probability.  Trivially,  the  conditions  of  Lemma  1  hold  if  and  only  if 
the  conditions  to  Lemma  2  hold.  This  means  that  x  is  an  ancestor  of  y 
in  one  tree  if  and  only  if  it  is  an  ancestor  of  y  in  the  other.  Since 
this  information  allows  us  to  uniquely  construct  a  tree,  the  two  trees 
are  the  same  and  the  theorem  is  proved. 

Note  also  that  the  theorem  also  holds  if  we  are  given  a 
probability  distribition  over  the  initial  trees  since  the  two  rules 
perform  identically  on  each  tree. 

As  is  the  case  with  linked  lists,  the  first  request  rule  creates 
the  same  tree  as  if  the  keys  were  not  known  a  priori,  and  each  "new"  key 
(one  requested  for  the  first  time)  was  inserted  into  the  tree.  If  the 
initial  tree  was  created  in  this  manner,  the  move  to  root  rule  will  not 
decrease  the  cost. 

The  characterization  given  in  Lemma  1  allows  us  to  determine 
the  time  varying  and  steady  state  costs  for  the  move  to  root  rule.  As 
stated  in  the  theorem,  these  will  equal  the  cost  of  the  first  request 
rule. 

Theorem:  If  key  k.  has  probability  p.  of  being  requested  and  the  keys 

are  ordered  k,  <  k0  <  ...<  k  ,  the  cost  for  the  move  to  root  rule  after 
I    c  n 

t  requests  is: 
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PiP-i  2p.p.  t 

UT<jsn  bij         Ui<jsn  n  J1         J   1J         Bij  n 

Where  Aj.  is  the  probability  that  ki   is  an  ancestor  of  k.   in  the  initial 

max(i,j) 
tree  and  B . .  =  I  p.  ,  the  probability  of  requesting  a  key  between 

1J       k=min(i,j)   K 

k.  and  k.  inclusive. 

i  j  

Proof:  We  first  determine  Prob(k.  is  an  ancestor  of  k.  after  t  requests) 
By  Lemma  1,  this  is  the  sum  of: 

(1)  The  probability  that  neither  k.  nor  k.  nor  any  key 
between  them  has  been  requested  in  t  requests  and  k. 
was  originally  an  ancestor  of  k..  This  probability  is 
(l-B^'V 
and  (2)  The  probability  that  neither  k.  nor  a  key  between  k. 

and  k.  has  been  requested  since  k.'s  most  recent  request. 
This  equals  the  probability  that  k.  was  most  recently 
requested  at  time  m  and  neither  k.  nor  any  key  between 
k.  and  k.  nor  k.  (since  m  was  the  most  recent  request) 
was  requested  after  time  m.  Hence  the  probability  of 
(2)  equals 
t        t-m     l-O-B.,)* 


i=i  n   1J 


=  p 


i   B, , 


m*i      u  ij 


Therefore  Prob(k.  is  an  ancestor  of  k.  at  time  t)  = 

J 


t       l-O-B..)* 

1-B..)  A..  +  p. 5 — ^ — 

ij   1J   Ki   Bi  . 


94 


Then 

n 
E(Cost)  =  1  +  I   p.  I   Prob(k.  is  an  ancestor  of  k.  at  time  t) 
1-1  1J?1     J  ' 

p.p.  2p.p.  . 

=   1+21         ^-J-+         I       [p  A       +  p.A       -  -g^-Jd-B     )t 
l*1<j«n  bij        l<Kj*n     n  J1        J  1J        bij  1J 

□ 

It  is  interesting  to  note  that  the  asymptotic  cost  equals 

pipi 
1  +  2  I       r —  >  which  bears  a  striking  resemblance  to  the  asymptotic 

l<1<j<n  ij 

cost  of  the  move  to  front  rule  (p.+p.  has  been  replaced  by  B..  in  the 

denominator).  Also,  the  formula  gives  the  initial  cost  (t=0)  as 

1  +    I       [P^A..  +  p.A  •]. 
l*1<j<n  1  J    J  J 

For  a  tree  built  by  random  insertion,  k.  will  be  the  ancestor  of  k. 
(i<j)  if  and  only  if  it  is  the  first  to  be  inserted  from  the  set  of  all 
keys  from  k.  to  k.  inclusive.  Since  each  of  these  j-i+1  keys  is  equally 
likely  to  be  first,  this  gives  us  A. .  =  A..  =  .  . ,,  ,  and  the  cost 
equals 

(jiPi[Hi+Hn..+1])-l. 

We  now  consider  the  move  up  one  rule.  We  would  expect  it  to 
have  lower  asymptotic  cost  than  the  move  to  root  rule  (as  an  analogy  to 
the  case  of  linked  lists).  However,  simulations  (See  Table  3.1.1) 
suggest  that  this  is  not  the  case.  The  move  to  root  rule  had  signifi- 
cantly lower  asymptotic  cost  in  four  out  of  six  distributions  tested. 
In  addition  its  average  asymptotic  cost  was  lower. 
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Table  3.1.1 

A  Comparison  of  the  Move  to  Root 
and  Move  Up  One  Rules 


ZIPF'S 

LAW 

English 
Letters 

#1 

#2 

#3 

#4 

#5 

Average 

Random  Cost 

5.15 

7.26 

7.50 

7.27 

7.33 

7.63 

7.40 

Optimal  Cost 

3.32 

4.10 

3.93 

4.16 

4.06 

3.96 

4.04 

MTR  Cost  (Exact) 

4.31 

5.63 

5.53 

5.68 

5.59 

5.55 

5.59 

Increase  Over 

Optimum 

29.8% 

37.0% 

40.7% 

36.5% 

37.6% 

40.0% 

38.4% 

Move  Up  One  Rule 
Cost 

4.77 

6.27 

5.52 

6.07 

6.18 

5.43 

5.90 

Increase  Over 

Optimum 

43.8% 

52.8% 

40.6% 

45.9% 

52.1% 

37.1% 

45.7% 

A  simulation  was  run  to  determine  the  cost  of  the 

tables)  °nFifrtvVdnd  °ther  rUl6S  aPP-°inVintter 
the  co  t  and  lVreet   w,:re/andomly  generated,  and 
the  cost  and  other  statistics  were  recorded  after 
500  requests.  The  probability  distributions  we  con- 
sidered were  the  English  letters  and  five  others 

\^e  ^nerated  by  choosing  a  random  ordering  of 
100  keys  whose  probabilities  were  given  by  Zipf's 
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Further  evidence  is  provided  by  considering  these  rules 
applied  to  a  tree  of  only  three  nodes.  We  label  the  keys  A,  B  and  C 
with  probabilities  a,  b  and  c  respectively.  The  rules  form  Markov 
chains  with  five  states,  each  corresponding  to  a  different  tree  of 
three  nodes.  The  transition  matrices  are  easily  determined,  and  from 
them,  the  steady  state  cost  is  easily  obtained.  This  calculation  was 
done  for  a  =  0,  .01,  .02,. ..,.99,  1.  b  =  0,  .01 , ...  1 -a  and  c  =  1-a-b. 
The  results  are  shown  in  Figure  3.1.5.  Note  that  the  move  to  root 
rule  does  outperform  the  move  up  one  rule  for  a  considerable  number 
of  the  data  points.  Since  these  calculations  were  not  simulations,  but 
were  done  exactly  (within  the  precision  of  the  computer),  the  move  to 
root  rule  does  have  a  lower  asymptotic  for  some  distributions,  and 
hence  a  theorem  showing  the  move  up  one  rule  to  always  be  superior  (as 
for  linked  lists)  cannot  be  true. 

To  get  an  intuitive  idea  of  why  the  move  up  one  rule  can 

behave  poorly,  let  us  consider  what  will  cause  a  given  node  to  move  up 

in  the  tree.  For  a  given  node  B,  let  us  look  at  A,,  A2,...,A. ,  the 

ancestors  of  B,  ordered  in  increasing  distance  from  the  root.  From 

the  properties  of  the  rotations,  we  can  verify  the  B  will  move  up  in 

the  tree  if  B  itself  is  requested  or  if  A.  is  requested  and  A.  ,<  A.<  B 

or  A.  n>  A.  >  B.  B  will  move  down  one  level  if  either  of  its  sons  is 
l-l   i 

requested  or  the  son  of  A.  that  is  not  A.+1  is  requested.  Thus,  we  can 
see  that  the  movements  of  B  are  controlled  by  much  more  than  just  its 
probability.  If  B  is  far  from  the  root,  it  may  be  difficult  for  B  to 
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The  curve  in  this  figure  shows  those  a  and  b  where  the  cost  of  the  move 
to  root  rule  equals  that  of  the  move  up  one  rule.  The  move  to  root  rule 
has  lower  cost  in  the  region  to  the  right  of  the  curve  (53%  of  the  total 
area)  and  the  move  up  one  rule  has  lower  cost  in  the  region  to  the  left. 


Figure  3.5.1 
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move  up  in  the  tree.  On  the  other  hand,  the  move  to  root  rule  promotes 
nodes  all  the  way  to  the  root  of  the  tree,  so  high  probability  nodes 
cannot  spend  a  lot  of  time  "trapped"  far  from  the  root. 

We  derived  a  closed  form  for  the  cost  of  the  move  to  root 
rule  as  a  function  of  time  which  bore  a  striking  resemblance  to  that 
of  the  move  to  front  rule.  The  move  to  root  rule  was  also  shown  to  be 
identical  to  the  first  request  rule.  A  simulation  estimating  the  cost 
of  the  move  up  one  rule  suggested  it  was  often  inferior  to  the  move  to 
root  rule  (see  Table  3.1.1).  Both  rules  performed  well  and  provided 
reasonable  decreases  over  the  cost  of  a  random  tree.  The  move  to  root 
rule  averaged  within  38  percent  of  the  optimum,  while  the  move  up  one 
rule  was  within  45  percent.  These  average  costs  suggest  that  the  move 
to  root  rule  would  be  the  better  choice. 

3.2  Monotonic  Trees 

Another  method  for  getting  more  frequently  accessed  nodes 
high  in  the  tree  is  to  keep  a  frequency  count  associated  with  each  node. 
The  node  with  the  largest  count  becomes  the  root  of  the  tree,  and  each 
subtree  is  formed  recursively,  using  the  same  rule.  Such  a  tree  is 
called  monotonic  because  the  frequency  count  for  any  given  node  is 
greater  than  or  equal  to  that  of  any  of  its  descendants.  (This  property 
is  the  same  as  the  one  required  for  a  heap,  see  Williams  [17].) 

It  is  a  simple  matter  to  keep  the  tree  ordered  in  this  manner. 
Rotations  are  used  to  promote  the  requested  key  until  a  key  with  equal 
or  greater  count  is  encountered.  The  resulting  tree  will  have  the 
monotonic  property. 
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Asymptotically,  the  most  probable  key  will  become  the  root 
of  the  tree  (by  the  Law  of  Large  Numbers,  it  will  be  requested  the 
most  times),  and  each  subtree  will  have  its  most  probable  node  as  its 
root.  The  asymptotic  tree  will  be  monotonic,  with  probabilities  as 
weights.  This  allows  us  to  easily  calculate  the  asymptotic  cost  for  this 
method.  Table  3.2.1  show  it  averages  within  15  percent  of  the  optimum. 

However,  this  method  is  very  poor  for  some  distributions. 
Suppose  key  k.  has  probability  p.  and  that  the  lexicographic  ordering 

of  the  keys  is  k,  <  k0  <...<  k 

1    ^      n* 

Table  3.2.1 
The  Performance  of  Monotonic  Trees 


ZIPF'S 

LAW 

English 

Letters 

#1 

#2 

#3 

#4 

#5 

Average 

Random  Cost 

5.15 

7.26 

7.50 

7.27 

7.33 

7.63 

7.40 

Optimal  Cost 

3.32 

4.10 

3.93 

4.16 

4.06 

3.96 

4.04 

Monotonic  Tree 

Cost  (Exact) 

3.77 

4.91 

4.18 

5.32 

4.68 

4.20 

4.66 

Increase  Over 

Optimal 

13.6°/ 

]Q    1°L 

£    Z°/ 

07    no/ 

See  Table  3.1.1  for  explanation. 


TOO 


If  the  p.  are  approximately  equal  and  p,  >  p?  >...>  p 
then  the  skewed  tree  shown  below  will  result. 


Figure  3.2.1  A  worst  case  monotonic  tree. 

A  theorem  by  Mel  home  [13]  shows  how  bad  this  can  be. 

Theorem  (Melhorne  [13]):  The  ratio  between  the  cost  of  a  monotonic 
tree  and  the  optimal  tree  may  be  as  high  as  n/(4  log  n)  for  trees  with 
n  nodes. 

This  theorem  depends  on  a  \/ery   unfavorable  choice  for  the 
ordering  of  the  keys  and  only  gives  an  idea  of  the  worst  case  per- 
formance of  monotonic  trees.  We  now  consider  how  these  trees  perform 
on  the  average  by  assuming  the  probabilities  are  randomly  chosen  in 
some  way.  The  first  method  we  consider  chooses  the  probabilities  from 
a  given  set  of  n  probabilities.  The  second  chooses  the  probabilities 
from  some  given  probability  density  function.  We  now  investigate  the 
first  method. 

Theorem  (Knuth  [3,  p.  432]):  Given  n  keys  and  n  probabilities  (p..  ^  p2  * 

...  >  p  ),  if  each  of  the  nl  assignments  of  probabilities  to  keys  is 

n  n 

equally  likely,  the  expected  cost  of  a  monotonic  tree  is  [2  £  H-p.]  -  1. 

i=l  q  n 
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Proof:  An  assignment  of  probabilities  to  the  keys  imposes  an  ordering 
on  the  probabilities.  Probability  p.  is  to  the  left  of  p.  if  the  key 
to  which  p.  has  been  assigned  is  to  the  left  of  the  key  to  which  p.  has 
been  assigned.  We  have  assumed  that  each  of  the  n!  orderings  is 
equally  likely.  The  cost  of  a  monotonic  tree  is  solely  determined  by 
this  ordering  imposed  on  the  probabilities.  Hence,  the  problem  is 

equivalent  to  assigning  p.  to  key  ^  and  then  randomly  ordering  the  k. 

since  each  of  the  n!  orderings  on  the  probabilities  will  still  be 

equally  likely.  This  restatement  turns  out  to  be  simpler,  and  we  work 

with  it  instead. 

Let  £.  be  a  random  variable  denoting  the  level  of  k..  By 
definition, 


C0St  =  I    p.£. 

i  =  l  ■•  ] 

E(Cost)=  E(  I   p£)  =  I   p  E(£.) 
i=l  1  1    i=i  i   i' 


So 


Define        R.  -(    ^   1fkJ   1s  an  dncestor  of  ki 
0  otherwise 

T^n     £.  =R1+R2+...+  Ri%i  +  1  (R.  =  0if  j,i) 

E(^)  =  E(R7)  +  E(R2)  +  ...+  E(RN1)  +  1 

i-1 

=  J     Prob(k,  is  an  ancestor  of  k.)  +  1 
j=l      J  l' 

To  determine  this  probability,  we  discuss  some  properties  of  a 
random  ordering.  Consider  any  two  keys,  k.  and  k..  There  are  only  two 
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distinct  orderings  of  k.  and  k.;  k.  to  the  left  of  k,  and  k.  to  the 
right  of  k.,  each  having  probability  1/2.  For  either  of  these  two 
orderings,  a  third  key  can  be  in  three  different  regions:  to  the  left 
of  both  keys,  between  the  two  keys,  and  to  the  right  of  both  keys, 
each  with  probability  1/3.  In  general,  any  ordering  of  i  keys  creates 
i+1  regions,  each  having  probability  1/i+l  of  containing  a  given  key. 

Consider  any  ordering  of  k, ,...,k,_ ,  and  k..  Now  k.  will  be 
an  ancestor  of  k.  if  no  key  with  probability  greater  than  k.  (that  is, 
k1,k«»...»k.  -.)  occurs  between  k.  and  k. .  For  this  to  happen,  k.  must 

occur  in  either  the  region  to  the  left  of  k.  or  the  region  to  the  right. 

2 
Since  there  are  j  keys  in  the  ordering,  this  probability  is  -rrr. 

1-1  ? 
Hence  E{i.)   =  (  I  -nr)   +  1 
1     J-l  J  ' 


=  2H.  -  1 
l 

n 
and  hence  E(Cost)  =  I   p.(2H.-l) 

i=l  1   n 

The  following  theorem  tells  us  the  cost  of  a  tree  built  by  a 
random  sequence  of  insertions. 

Theorem:  Given  n  keys  (k,  <  k2  <...<  k  )  and  a  set  of  n  probabilities 
{p.  :  1  £  i  £  n},  if  the  probabilities  are  randomly  assigned  to  the  keys 
and  then  a  tree  is  built  by  a  random  sequence  of  insertions,  its  expected 
cost  will  be  2^n  '   H  -  3  for  any  set  of  probabilities. 
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Proof:  Let  p(k.)  be  random  variables  denoting  the  probability  chosen 
for  k.  and  let  I.   denote  the  level  of  k. .  As  before, 


n 
Cost  =  I   p(k.U. 
i=l   1  n 

n  n 

E(Cost)  =  E(  I   p(k.U.)  =  I   E(p(k.H.) 
i=l   n  1    i=l     ]  n 

The  insertion  sequence  (and  hence  I.)   does  not  depend  on  p(k.)-  These 
two  random  variables  are  independent  and 

E(Cost)  =  I   E(p(k.))  EU_.)  =  i  I  E(£.) 
1=1     1  n    n  i=l   n 

E(£.)  =  1  +  1  +  £  Prob  (k.  is  an  ancestor  of  k, ) 

Now  k.  will  be  an  ancestor  of  k.  if  and  only  if  it  occurs  in  the 
insertion  sequence  before  k.  and  any  key  between  k.  and  k..  This 
probability  is  -|jTn+T.  Therefore 


"V-'^n^r 


i-1   1      n    , 


■  1   ♦  [Hrl]  ♦  [Hn.i+1-1] 
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?     n 

-     I  H,    -   1 


2     y  n-i  +  1       1 


2     ?  n+1       2     n 


-lusiiiH  -3  n 

n         n  I I 

This  quantity  is  the  same  as  that  derived  by  Hibbard  [18]. 
However,  he  assumed  that  the  keys  were  equally  probable,  and  our  result 
holds  for  any  set  of  probabilities  as  long  as  they  are  randomly  assigned 
to  the  keys. 

To  compare  the  monotonic  and  random  tree  costs,  note  that 

if  we  substitute  p.  =  —  into  the  formula  for  the  monotonic  tree  cost, 
l   n 

we  get  exactly  the  expression  for  the  random  tree  cost. 

=  -[(n+l)?i-  ?  1]  -  1  =^+U.H  -3.  Clearly, 
n      -j^i1   i  =  i  n    n 

p.  =  —  is  the  worst  case  since  pn  >  p0  >...>  p„  and  the  coefficients  of 
ri   n  I    c  n 

the  p.  increase  with  i.  Hence,  except  for  the  case  where  p.  =  — ,  the 
monotonic  tree  is  better  than  the  random  tree.  If  some  of  the  p.  are 
large,  the  savings  can  be  quite  substantial. 
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To  demonstrate  this,  we  first  consider  a  set  of  probabilities 

satisfying  the  geometric  distribution,  p.  =  -5-,  r  <  1,  1  *  i  <  n,  where 

n+1  1   R 

r=  — is  a  normalizing  constant.  Substituting  this  into  the 

formula  for  the  cost  of  a  monotonic  tree,  we  get 


n         r\       ,    .  2     ?     i  I  1 


1=1   n    K  Ki  =  l     j=lJ 


,       2  ?  1   rj-rn+1 


n  t     n 


Wi=j  V,J    1-' 


2      r  ?  rJ        n+1  ?  ■ 


[  !V--r"T|  T4-]  -  1 


r.rn+l  ^J  j^J 


If  n  is  large,  this  is  approximately 


2     1 

—  ln(y- -)  -  1,  a  constant  independent  of  n, 


If  the  probabilities  satisfy  Zipf's  Law,  the  cost  is 
n      1        ?   n  H. 
1=1      n       n  i=l 


"  Hn   2  (Hn  "  Hn  )  "  ] 


where 


fo\         n  i     °°  i     2 
H(2)  =  1     1  <  y  1  .* 

1=1  i2   1=1  i2   6 
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Thus,  for  large  n,  the  cost  is  approximately  H  ,  which  is 
half  of  the  cost  of  the  random  tree.  For  both  distributions  the 
monotonic  tree  gives  significant  gains  over  the  random  tree. 

We  now  consider  a  method  of  selecting  the  key  probabilities 
that  has  been  studied  by  Nievergelt  and  Wong  [19].  Here  we  are  given 
a  probability  density,  f(x),  and  the  key  probabilities  are  chosen  with 
respect  to  that  density.  It  is  necessary  to  drop  the  requirement  that 

our  choices  must  sum  to  one,  so  instead  of  probabilities,  we  must  con- 

n 
sider  key  weights.  The  cost  of  a  tree  is  now  £  w.£.  where  w.  is  the 

i=l  1  ]       ] 
weight  of  k. . 

We  now  need  two  standard  definitions 


Define  :  E(f)  = 


xf(x)dx,  the  mean  of  f(x) 


Define  :  F  (x)  =   f(y)dy,  the  distribution  function  of  f(x) 

J 

-oo 

F  (x)  is  the  probability  a  number  chosen  according  to  the  density  func- 
tion is  less  than  or  equal  to  x. 

The  following  theorem  defines  the  cost  of  a  monotonic  tree 
for  an  arbitrary  density  function. 

Theorem  :  Given  n  keys  (k,  <  k„  <...<  k  )  and  a  density  function  f(x), 
if  the  weights  of  the  keys  are  independently  chosen  from  this  density 
function,  the  expected  cost  of  the  resulting  monotonic  tree  is 
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n-1 


yf(y)F  (y)1'  dy 


2E(f)[nH    -("!)]-  2  I  ?^- 

n-l    ^        i=1  i  j 

— oo 

Proof:  As  before,  we  have 

n  n 

E(Cost)  =  I   E(w,(l+  I   A..))  =  nE(f)  +  J   7  E(w.A..) 
1=1   1   #1  J1  1-1  j?i     J 

where  w-  is  the  weight  chosen  for  k^ 


,  1  if  k.  is  an  ancestor  of  k. 
and  A..  =<      j  l 

J1   1  0  if  not 


Note  that  w.  and  A.,  are  not  independent. 

0  ' 


ECw.A^.)  =   y  ProbfwjAj.  =  y)dy 


A.,  can  just  equal  0  or  1 ,  and  if  A..  =  0  the  only  y  having 
nonzero  probability  is  y  =  0.  Since  this  will  be  multiplied  by  y  =  0, 
the  case  with  A.,  can  be  ignored  and 

oo 

r 


E'(trfAj1 )  =   y  Prob(wi  =  y  and  A^.  =  l)dy 


To  determine  Probfw.  =  y  and  A..  =  1)  we  note  that  k.  will 
be  an  ancestor  of  k.  if  and  only  if  w.  >  w.  and  w.  is  greater  than  the 
weight  of  any  key  between  k.  and  k.  in  the  ordering  on  the  keys.  The 
probability  that  w.  =  y  is  f(y)dy.  We  then  chose  an  x  ^  y  for  w. . 
Any  specific  x  is  chosen  with  probability  f(x)dx.  For  this  x,  we  must 
chose  the  |i-j|-l=m  keys  between  k.  and  k.  to  have  weight  less  than  or 
equal  to  x.  The  probability  for  this  isF(x)m.  The  product  of  these 
must  be  integrated  over  x  ^  y,  giving 


Prob(w.  =  y  and  A. .  =  1 )  = 


rr: 


f(y)  f(x)F(x)m  dxdy 


Then 


vm+l 


■  f(y)  ^Itf    .  since  ^-  -  f(x)  and  F(»)  -  1 


m+T 


dx 


E(wiAji) 


D 

yf(y)  []'F^(  ]  dy. 


Note  that  this  quantity  depends  only  on  m,  and  not  the  values  of  i  and  j 
Since  there  are  2(n-m-l)  distinct  ordered  (i,j)  pairs  having  a  given 
value  of  m, 

n-2  r  ,    r-/..xm+l 

I 
m=0 


n"2                     f               l-FMm+1 
E(Cost)  =   nE(f)  +     I     2(n-m-l)       yf (y)  [-^j^p ]   dy 


=  nE(f ,  +  znf  #i ' 

m=0 


yf(y)dy  -  2^     5^1  f  yf (y )F(y)m+1   dy 
m=0  J 


n-1 


n-m 


2E(f)[nH i  -   l^-l)]  -  2     I    lOL-      yf(y)F(y)m      dy 


m=l 


Nievergelt  and  Wong  give  us  two  measures  to  which  we  can 
compare  this  cost. 


Theorem  (Nievergelt  and  Wong  [19]):  The  cost  of  the  optimal  tree  whose 
weights  are  chosen  according  to  f(x)  is  E(f)  n  log  n  +  0(n). 
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Theorem  (Nievergelt  and  Wong  [19]):  The  cost  of  a  random  tree  whose 
weights  are  chosen  according  to  f(x)  is  (2  £n  2)  E(f)  n  log  n  +  0(n). 

Nievergelt  and  Wong  also  considered  choosing  the  weights  for 
a  monotonic  tree  from  a  uniform  distribution.  The  resulting  cost  was 
(2  In  2)  E(f)  n  log  n  +  0(n),  asymptotically  equal  to  that  of  the 
random  tree.  They  conjectured  that  this  held  for  any  probability 
distribution.  We  now  show  this  conjecture  to  be  true.  I  am  grateful, 
to  D.  L.  Burkholder  for  the  proof  of  the  following  lemma: 


Lemma:   For  any  density  function  f  with  a  finite  mean, 
n-1 


i=l  i 


yf(y)F1(y)dy  =  o(n  log  n) 


Proof  :  F  irst  note  that  for  any  i  >,   1, 


yf(y)?Hy) 


yf(y) 


Hence  Lebesgue's   Dominated  Convergence  Theorem  (see  [21])  applies  and 
we  have 


lim 
i-H» 


lim  r-i 


yf(y)F1(y)dy  =  j  yf(y)  j™  F^yjdy  =  0 


since 


lim  ,-i 


"|-M» 


Fn(y)  =  0  if  F(y)  <  1  =  0  and  f(y)  =  0  if  F(y)  =  1 


no 


Now, 


n-1 


n-i 


1=1      1 


yf(y)  F1(y)dy 


n-l   -| 


<n  I    j        yf(y)  F^yjdy 


(1) 


We  now  choose  N  such  that 


yf(y)  F1(y)dy  <  e  for  i   5»  N, 


Putting  this  in  (1)  gives 


N-l   ,      f  .  n-l 

<  n    I    I       yf(y)  F1(y)4y  +  n    J    f 

i=l   1     J  i=N  ] 


N-l   r/^\         n-l 
<n   J    ^+nj    f 


i=l 


i=N 


Since  H  <  £n  x  +  1 ,  we  have 

A 


<  n(ln(N-l)  +  1)  E(f)  +  ne(ln(n-l)  +  1) 


Therefore 


n(ln(N-1)  +  1)  E(f)  +  ne(ln(n-l)  +  1) 
n  log  n 


=  (ln(N-l)  +  1)  E(f)  +  e(In(n-1)  +  1 
log  n       (log  e)(  in  n) 


Ill 


.  (ln(N-l)+l)  E(f)     _e_    +        e 

log  n        log  e    (log  e)(ln  n) 

We  can  make  the  first  and  third  terms  arbitrarily  small  (say,  less  than 
e)  by  choosing  n  sufficiently  large.  Therefore, 


00 

i=l  i 


yf(y)  F^yjdy 


T7^)   e 


n  log  n  log  e 

for  n  >  N'.  Therefore  the  limit  of  this  ratio  is  zero  as  n  ■>  °°  and  the 
lemma  is  proved.  I  1 

Theorem:  If  n  keys  have  their  weights  chosen  according  to  any  density 
function  with  finite  mean,  the  expected  cost  of  a  monotonic  tree  is 
(2  In  2)E(f)  n  log  n  +  0(n),  asymptotically  equal  to  the  cost  of  a  tree 
built  from  a  random  insertion  sequence. 

Proof:  The  cost  of  a  monotonic  tree  is 

2E(f)[nH  ,  -  (£-1)]  -  2n"l  Iti  fyf(y)  F^y)  dy 
n"'    *        i=i   i  J 

-co 

The  first  term  is  asymptotically  equal  to  2E(f)  n  In  n  =  E(f)  n  log  n. 
The  final  term  was  shown  by  the  lemma  to  be  o  (n  log  n).  Hence  the 
asymptotic  cost  is  (2  In  2)  E(f)  n  log  n. 

We  now  show  that  the  cost  of  a  monotonic  tree  is  less  than  or 
equal  to  that  of  a  random  tree,  proving  that  the  cost  of  a  monotonic 
tree  equals  (2  In  2)E(f)  n  log  n  +  0(n). 


112 


The  method  we  are  using  to  select  key  weights  choses  n 
weights  independently  from  a  density  function.  An  equivalent  method 
first  selects  a  set  of  n  weights  from  an  n-dimensional  density  function. 
This  function  is  constructed  so  that  the  probability  of  choosing  a 
given  set  equals  the  probability  of  obtaining  it  (in  any  order)  from  n 
selections  from  the  original  function.  We  then  choose  a  permutation  of 
the  set. 

Now  consider  any  set.  We  have  already  studied  the  case  where 
the  key  probabilities  (easily  generalized  to  include  key  weights)  were 
selected  from  a  set  and  found  the  expected  cost  of  a  monotonic  tree  to 
be  less  than  or  equal  to  that  of  a  random  tree.  Since  this  holds  for 
es/ery   set  in  the  n-dimensional  probability  density,  the  theorem  is 
proved.  I I 

Finally,  we  cite  the  results  of  a  simulation  run  by  Walker 
and  Gottlieb  [14]  that  showed  the  performance  of  monotonic  trees  to 
be  poor.  They  state  that  although  these  poor  results  are  partially 
explained  by  the  fact  that  the  leaf  weights  cannot  influence  the  structure 
of  the  tree,  even  the  tests  with  all  leaf  weights  equal  to  zero  did  not 
produce  acceptable  nearly  optimum  trees. 

Indeed  the  majority  of  the  results  concerning  monotonic  trees 
are  quite  discouraging.  This  method  performs  well  only  when  we  are 
quaranteed  that  the  key  probabilities  will  differ  significantly  from  a 
uniform  distribution  (i.e.,  have  low  entropy).  If  this  is  not  the  case 
(as  in  the  situation  described  by  Nievergelt  and  Wong),  the  performance 
is  asymptotically  the  same  as  randomly  built  trees. 
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3.3  Cost  Balanced  Trees 

The  previous  methods  have  focused  on  the  fact  that  a  rotation 
moves  a  certain  node  up  in  the  tree,  ignoring  the  fact  that  it  also  dis- 
turbs two  (possibly  large)  subtrees.  The  method  of  cost  balancing 
considers  the  entire  tree.  We  do  a  rotation  only  when  it  appears  to  be 
profitable,  that  is,  when  the  number  of  accesses  to  the  nodes  that  will 
move  up  exceeds  the  number  to  those  that  will  move  down. 

This  method  has  the  advantage  that  it  is  possible  to  do  the 
rebalancing  during  the  search  for  the  requested  key  since  we  know  in 
which  subtree  it  lies.  For  example  in  Figure  3.3.1,  we  perform  a 
rotation  to  promote  A,  if  w(A)  +  w(S,  )  >  w(B)  +  w(C)  +  w(S3)  +  w(S«). 
We  promote  C  if  w(C)  +  w(S4)  >  w(B)  +  w(A)  +  w(S])  +  w(S2).  Here,  w(A) 
is  the  number  of  times  A  has  been  requested,  and  w(S->)  is  the  number  of 
times  any  node  in  S,  has  been  requested.  All  this  information  is  avail- 
able at  node  B,  and  any  rebalancing  can  be  done  there. 


Figure  3.3.1 
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Another  advantage  of  this  rule  is  that  the  leaf  weights  play 
a  rule  in  the  balancing  of  the  tree.  Since  a  rotation  that  promotes 
the  nodes  in  subtree  S,  also  must  promote  the  leaves,  the  "weight"  of 
S,  must  be  the  number  of  accesses  to  both  the  nodes  and  the  leaves  of 
S,.  If  the  weight  of  the  leaves  is  considerable,  this  is  a  significant 
advantage  over  previous  rules,  all  of  which  ignored  accesses  to  leaves. 

However,  balancing  at  one  node  may  cause  other  nodes  to 
become  unbalanced.  (See  below). 


^ 


Figure  3.3.2 


Here,  both  node  A  and  node  C  may  require  rebalancing.  (However,  no 
rebalancing  would  be  required  at  node  A  if  node  C  had  been  its  left 
son).  An  attempt  to  correct  these  imbalances  (and  all  the  imbalances 
resulting  from  the  corrections)  could  be  quite  costly.  A  more  rea- 
sonable policy  is  to  ignore  the  imbalances  and  rebalance  at  a  later 
request  when  the  search  path  passes  through  the  unbalanced  node. 

Tables  3.3.1  and  3.3.2  compare  these  two  rules.  The  total 
rotation  rule  (which  corrects  all  imbalances)  has  a  slightly  lower  cost, 


Table  3.3.1 

The  Performance  of  the  Limited  Single 
Rotation  (LSR)  Rule 
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ZIPF'S  LAW 


English 
Letters 

Random  Cost 

5.15 

Optimal  Cost 

3.32 

LSR  Cost 

3.44 

Increase  Over 

Optimum 

3.55% 

Average  Number  of 
Rotations/Request 

.111 

Average  Over  the 
Last  100  Requests 

#1 


n 


#3 


f  4 


? 5        Average 


7.26     7.50  7.27  7.33  7.63  7.40 

4.10     3.93  4.16  4.06  3.96  4.04 

4.33     4.14  4.46  4.28  4.20  4.28 

5.46%  5.31%  7.29%  5.42%  5.98%  5.89% 

.199     .204  .199       .197       .200  .200 


033     .041        .040       .039       .034 


038 


See  Table  3.1.1   for  explanation. 


Table  3.3.2 

The  Performance  of  the  Total  Single 
Rotation  (TSR)  Rule 
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ZIPF'S  LAW 


English 
Letters   0  1 


*2 


13 


«4 


t  5    Average 


Random  Cost       5.15 

Optimal  Cost       3.32 

TSR  Cost         3.41 

Increase  Over 

Optimum  2.93% 

Average  Number  of 
Rotations/Request         .113 

Average  Over  the 
Last  100  Rotations 


7.26     7.50  7.27  7.33  7.63  7.40 

4.10     3.93  4.16  4.06  3.96  4.04 

4.33     4.11  4.41  4.22  4.17  4.25 

5.57%  4.57%  6.02%  3.92%  5.27%  5.07% 

.220     .219  .217       .209  .213         .215 

.040     .048  .044       .036  .036         .041 


See  Table  3.1.1   for  explanation, 
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It  gives  an  increase  of  5.07%  over  the  optimal  cost  (as compared  with 
5.89  percent  for  the  limited  rotation  rule)  with asuprisingly  small 
increase  in  the  number  of  rotations  required  (an  average  of  .215  per 
request  as  compared  with  .200).  However,  there  is  much  more  overhead 
associated  with  a  rotation  in  the  total  rotation  rule.  Since  imbal- 
ances can  propogate  throughout  the  tree,  either  a  pointer  to  a  node's 
father  must  be  maintained  or  we  must  stack  the  nodes  encountered  during 
the  search  for  the  requested  key. 

These  tables  also  show  how  much  work  the  rules  do  after  many 
requests.  We  consider  the  last  100  requests  out  of  500  in  the  simula- 
tion. The  limited  rotation  rule  does  an  average  of  .038  rotations  per 
requested  during  this  period,  or  approximately  one  rotation  every  27 
requests.  The  total  rotation  rule  averages  .041  rotations  per  request, 
or  one  rotation  e\/ery   24  requests. 

A  weakness  of  these  rotation  rules  is  that  they  do  not  con- 
sider the  "inside"  subtrees  (the  right  subtree  of  a  node's  left  son,  or 
the  left  subtree  of  its  right  son,  see  Figure  3.3.3). 


Figure  3.3.3  The  inside  subtrees  of  node 
A  are  darkened. 
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A  rotation  can  promote  either  exterior  subtree,  but  the  interior  sub- 
trees remain  at  the  same  level.  This  can  lead  to  very   poor  trees  that 
are  still  "stable"  in  that  no  rotation  can  be  performed.  Figure  3.3.4 
shows  an  example.  This  tree  is  stable  as  long  as  the  weight  of  a  node 
is  less  than  or  equal  to  that  of  his  father. 


Figure  3.3.4 

While  the  worst  case  performance  on  such  a  distribution  is 
quite  bad,  a  simulation  suggests  the  average  case  is  acceptable.  The 
probability  distribution  was  p,  .  ^  ,  p2  .  ^ P25  =  tItb  ' 

1  3  49 

P26  ~  7275  '  p27  "  7275  •"  p50  =  7275  '  The  tree  shown  inFi9ure  3.3.4 

is  stable  for  this  probability  distribution.  Yet,  after  500  requests 
the  limited  rotation  rule  reduced  the  cost  to  4.7593,  a  mere  3.06 
percent  increase  over  the  optimal  cost  of  4.6180. 

The  limited  single  rotation  rule  has  several  desirable 
features.  The  necessary  rotations  can  be  performed  during  the  search 
for  the  requested  key.  The  performance  for  the  distributions  we  con- 
sidered was  good,  within  5.89  percent  of  the  optimal.  After  an  initial 
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period  that  reorganizes  the  tree,  the  rule  required  a  rotation  approxi- 
mately every   27  requests,  a  very  low  maintenance  cost. 

The  total  single  rotation  rule  has  little  more  to  offer.  It 
decreases  the  cost  to  within  5.07  percent  of  the  optimum,  and  sur- 
prisingly does  only  slightly  more  rotations.  However  the  overhead 
required  by  this  rule  to  allow  changes  to  propogate  throughout  the  tree 
is  not  justified  by  the  relatively  small  decrease  in  cost,  making  the 
limited  rotation  rule  a  better  choice. 

3.4  Double  Rotations 

A  transformation  (called  a  "double  rotation")  that  allows  the 
promotion  of  inside  subtrees  is  shown  below. 


-> 


Figure  3.4.1     A  double  rotation, 
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A  rule  that  uses  both  single  and  double  rotations  has  several  ad- 
vantages over  a  rule  that  is  limited  to  single  rotations.  First,  such 
a  rule  will  always  be  able  to  promote  a  requested  node  if  it  is 
profitable  to  do  so.  Single  rotations  can  be  used  to  promote  nodes  in 
the  outside  subtrees  and  double  rotations  for  those  in  the  inside  sub- 
trees. 

A  double  rotation  actually  consists  of  two  successive  single 
rotations,  each  promoting  node  B  one  level.  The  double  rotation  has  an 
advantage  when  doing  both  of  these  rotations  will  reduce  the  cost,  but 
doing  only  the  first  will  not.  A  rule  that  is  restricted  to  performing 
single  rotations  will  check  if  the  first  rotation  can  be  done.  Since  it 
cannot  be,  the  tree  is  left  unchanged,  and  the  second  rotation  is  not 
considered.  A  rule  which  also  considers  double  rotations  will  be  able 
to  reduce  the  cost  in  this  situation. 

Table  3.4.1  shows  the  cost  of  a  rule  that  uses  both  single 
and  double  rotations.  For  the  distribution  considered,  this  method 
averaged  within  3.84  percent  of  the  optimal  cost.  The  total  number  of 
rotations  required  per  request  (counting  both  single  and  double 
rotations)  is  .208,  which  is  very  close  to  the  averages  for  the  limited 
rotation  rule  (.200)  and  the  total  rotation  rule  (.215). 

However,  after  many  requests,  fewer  rotations  are  required 
than  for  either  single  rotation  rule.  In  fact,  the  average  over  the 
last  100  requests  was  .027  single  rotations  (one  e\/ery   36  requests)  and 
.008  double  rotations  (one  eyery   129  requests). 


Table  3.4.1 

The  Performance  of  the  Double  Rotation 
(DR)  Rule 
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ZIPF'S  LAW 


English 
Letters   # 1 


#2 


§3 


#4 


#  5   Average 


Random  Cost       5.15 

Optimal  Cost      3.32 

DR  Cost  3.40 

Increase  Over 

Optimum  2.35% 

Average  Number  of 
Single  Rotations/ 
Request  .074 

Average  Over  Last 
100  Requests 

Average  Numbers  of 
Double  Rotations/ 
Request  .037 

Average  Over  Last 
100  Requests 


7.26     7.50  7.27  7.33  7.63  7.40 

4.10     3.93  4.16  4.06  3.96  4.04 

4.29     4.06  4.32  4.23  4.09  4.20 

4.62%  3.45%  3.80%  4.11%  3.22%  3.84% 

.129     .129       .127  .126       .126         .127 

.029     .027       .025  .028       .028         .027 

.082     .081        .082  .079       .080         .081 

.007     .007       .008  .007       .009         .008 


See  Table  3.1.1   for  explanation. 
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These  results  are  supported  by  a  simulation  run  by  Baer  [20]. 
He  assumed  all  keys  to  be  equally  likely  and  found  the  cost  of  the 
double  rotation  rule  to  range  from  1.2  percent  to  3.6  percent  of  the 
optimum.  This  cost  is  lower  than  that  we  obtained  because  more  re- 
quests were  made  in  Baer's  simulation.  In  addition,  his  trees  have 
fewer  nodes.  This  would  also  explain  the  lower  cost  since  the  cost  of 
smaller  trees  tends  to  be  closer  to  the  optimum  (see  Table  2.9.1, 
compare  the  cost  for  the  English  letters  (26  nodes)  with  the  others 
(100  nodes)). 

Baer  also  gives  statistics  on  the  number  of  rotations  done  by 
this  rule.  Again  his  results  agree  with  ours.  His  most  extensive 
simulation  (85,000  requests)  required  23  rotations  for  the  first  850 
requests  (one  every  37  requests).  The  next  7650  requests  caused  6 
rotations  (one  every  1,275  requests),  and  the  final  76,500  also  caused 
6  requests  (on  every  12,750  requests).  These  results  indicate  that  the  • 
cost  of  "maintaining"  the  tree  might  be  extremely  low. 

Bruno  and  Coffman  [12]  have  considered  an  extension  of  this 
rule  that  can  promote  a  node  any  number  of  levels  by  using  a  sequence 
of  rotations.  They,  however,  were  concerned  with  an  algorithm  to  build 
a  nearly  optimal  tree  from  a  set  of  known  key  probabilities  and  used  this 
set  of  transformations  to  reduce  the  cost  of  the  initial  tree.  Every 
final  tree  in  their  simulation  was  within  5  percent  of  the  optimum,  and 
the  average  was  within  2.6  percent. 

This  suggests  further  rules,  where  we  consider  promoting  the 
requested  node  i  levels  for  i  =  1,  2,  ...,  k,  where  k  is  a  parameter  of 
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the  rule.  Note  that  the  single  rotation  rules  have  k  =  1,  and  the 
double  rotation  rule  has  k  =  2.  Increasing  k  will  increase  the  work 
the  rule  must  do,  but  will  result  in  decreased  retrieval  times.  The 
results  of  Bruno  and  Coffman  suggest  that  the  retrieval  time  will  not 
be  greatly  improved  by  increasing  k  beyond  2,  while  the  increase  in 
the  complexity  of  the  algorithm  to  execute  the  rule  would  be  substantial 

3.5  Summary  and  Conclusion 

We  have  examined  several  methods  for  dynamically  altering 
binary  search  trees  to  decrease  their  access  time.  The  first  two 
methods  were  analogs  of  the  linked  list  case  :  the  move  to  root  rule 
and  the  move  up  one  rule.  The  move  to  root  rule  was  shown  to  be 
identical  to  the  first  request  rule  (analogous  to  the  case  of  the  linked 
list)  and  a  formula  for  the  cost  was  derived.  Calculations  showed  this 
method  to  be  an  improvement  over  a  tree  built  by  a  random  sequence  of 
insertions.  The  analogy  breaks  down  when  we  consider  the  move  up  one 
rule;  it  is  often  outperformed  by  the  move  to  root  rule.  A  simulation 
showed  the  move  to  root  rule  to  have  lower  average  cost  than  the  move 
up  one  rule  (38  percent  of  the  optimum  compared  with  45  percent),  in- 
dicating that  of  these  two  rules,  the  move  to  root  rule  would  be  the 
superior  choice.  However,  these  rules  should  be  used  only  if  we  cannot 
associate  a  counter  with  each  key.  If  we  can,  the  following  rules  will 
give  better  performance. 

We  next  considered  rules  that  use  counters.  The  first  of 
these  was  the  monotonic  tree  rule.  Its  performance  was  found  to  be 
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disappointing.  Melhorne  [13]  has  shown  that  the  ratio  of  the  cost  of 
a  monotonic  tree  to  that  of  the  optimal  tree  may  be  as  high  as 
n/(4  log  n),  for  a  tree  of  n  nodes.  If  the  weights  of  the  nodes  are 
chosen  according  to  a  probability  density  function  (a  case  considered 
by  Nievergelt  and  Wong  [19])  the  performance  is  asymptotically  the 
same  as  a  random  tree  for  any  probability  distribution.  A  simulation 
by  Walker  and  Gottlieb  [14]  also  confirms  the  poor  performance  of  this 
method.  Only  if  we  assume  the  probabilities  are  chosen  from  a  fixed 
set  (guaranteeing  they  will  be  "spread  out")  does  this  method  signifi- 
cantly improve  over  the  cost  of  a  random  tree.  A  formula  was  derived 
for  this  case,  and  significant  decreases  were  obtained  for  Zipf's  Law 
and  the  geometric  distribution.  This  assumption  was  also  true  in  the 
simulation  we  discussed.  It  showed  the  cost  of  this  method  to  average 
within  15  percent  of  the  optimal  cost. 

We  then  discussed  the  most  promising  methods.  Simulations 
showed  that  the  cost  of  the  limited  single  rotation  rule  averaged  with- 
in 5.89  percent  of  the  optimum.  However,  the  worst  case  performance 
was  \zery  bad,  resulting  in  the  tree  shown  in  Figure  3.3.4.  The  total 
single  rotation  rule  reduced  the  average  cost  to  5.07  percent  of  the 
optimum.  However,  since  this  rule  requires  much  more  overhead,  this 
small  decrease  in  cost  does  not  justify  its  use. 

Finally,  we  considered  the  double  rotation  rule.  Its  cost 
averaged  approximately  3.84  percent  of  the  optimum.  Though  this  rule 
must  check  for  both  single  and  double  rotations,  it  averages  a  rotation 
eyery   36  requests  and  a  double  rotation  e\/ery   129  after  the  initial 
period  of  reorganization.  Compared  with  the  limited  single  rotation 
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rule  (one  rotation  every   27  requests),  the  double  rotation  rule  does 
less  work  after  the  initial  period.  It  then  appears  to  be  the  best 
choice  of  the  counter  rules. 
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4.  CONCLUSION 

The  purpose  of  this  thesis  was  to  examine  various  heuristics 
that  dynamically  alter  data  structures  by  moving  frequently  accessed 
keys  near  the  "top"  of  the  data  structure. 

The  first  data  structure  we  considered  was  the  linked  list. 
If  relatively  few  requests  (compared  to  the  number  of  keys)  are 
anticipated,  the  fast  convergence  of  the  move  to  front  rule  (nearly  as 
fast  as  the  optimum)  makes  it  the  best  choice.  If  many  requests  are 
expected,  the  transposition  rule  gives  the  best  performance  because 
its  asymptotic  cost  is  close  (10  percent)  to  the  optimum.  For  an 
intermediate  number  of  requests,  the  first  request/transposition  rule 
combines  both  of  these  features  with  a  small  additional  overhead, 
making  it  the  best  choice  in  this  case. 

If  space  is  available  for  counters,  there  are  much  better 
rules.  If  enough  space  is  available  so  that  the  counters  will  never 
overflow,  the  frequency  count  rule  should  be  used.  If  this  is  not  the 
case,  the  limited  difference  rule  uses  whatever  space  can  be  spared  and 
gives  nearly  optimal  results  for  even  a  small  number  of  bits. 

The  second  data  structure  we  considered  was  the  binary 
search  tree.  Here  we  found  the  move  to  root  rule  to  give  the  best 
performance  of  the  rules  that  do  not  use  counters,  approximately  38 
percent  of  the  optimum.  If  counters  are  available,  the  double  rotation 
rule  appears  to  be  best.  Its  performance  averages  3.84  percent  of  the 
optimum,  and  after  a  period  of  initial  organization  of  the  tree,  it  is 
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very  inexpensive  to  execute.  On  the  average,  a  single  rotation  is 
done  es/ery   36  requests,  and  a  double  rotation  every  129  requests. 

The  methods  we  have  considered  are  simple  and  inexpensive 
to  execute.  In  addition,  they  significantly  reduce  the  average  access 
time  over  data  structures  in  which  the  keys  are  randomly  arranged;  in 
some  cases  these  methods  keep  the  structure  very  close  to  the  one  of 
optimum  cost. 
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APPENDIX 

We  will  make  great  use  of  Markov  chains,  so  a  summary  of 
their  important  properties  is  required.  These  can  all  be  found  in  [1]. 

To  define  a  Markov  chain,  we  first  consider  a  set  S  of  states 

and  a  sequence  (x  ,  n=0,l,...}  of  random  variables  which  take  their 

values  from  S.  The  value  x  tells  us  which  state  the  chain  is  "in" 

n 

at  time  n. 

In  addition,  the  Markov  property  must  be  satisfied.  This  is 
Prob{xn+]  =  Sn+1|xQ  =  SQ,...,xn  =  Sn}  =  Prob{xn+1  =  Sn+]  |xn  =  Sn>, 
which  says  that  the  probability  of  being  in  a  given  state  depends  only 
on  the  previous  state,  and  not  any  before  that.  We  then  define  the 
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probabilities  Prob{x  ,  =  j|x  =  i}  (or  just  P ( i , j ) )  as  the  trans iti 

probabilities  of  the  chain  and  can  form  a  matrix  whose  (i,j)   element 

is  Prob{x  ,,  =  jlx  =  i}.  This  is  called  the  transition  matrix  P.  If 
n+ 1    '  n 

we  are  given  a  probability  distribution  x  over  the  states,  then  xP 
gives  us  the  probability  distribution  at  the  next  time  step. 

This  defines  the  basic  idea  of  Markov  chains.  Several  more 
definitions  are  required. 

Definition  :  A  state  x  leads  to  a  state  y  if  there  exists  a  sequence  of 
states  x,,...,x  such  that 

P(x,x])  P(xrx2)  ...  P(xn,y)  >  0. 

Definition  :   A  set  C  of  states  is  closed  if  no  state  in  C  leads  to  a 
state  outside  of  C. 
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Definition:  A  closed  set  C  is  irreducible  (also  called  ergodic)  if  x 
leads  to  y  for  all  choices  of  x  and  y  in  C. 

Most  of  the  chains  we  are  dealing  with  will  be  irreducible.  That  is 
the  set  of  all  states,  S,  will  be  irreducible.  S  is  then  closed  since 
there  are  no  states  outside  of  S.  For  these  chains,  it  will  be  clear 
that  some  sequence  of  requests  can  be  designed  to  cause  any  state  to 
lead  to  any  other  state. 

Definition:  Define  the  period  of  a  state  x  as  the  greatest  common 
divisor  (g.c.d.)  of  the  set  {n  ^  1  :  P  (x,x)  >  0}.  It  can  be  shown  that 
all  states  have  equal  periods  and  this  is  defined  as  the  period  of  this 
chain.  A  chain  with  a  period  of  1  is  called  aperiodic. 

Nearly  all  of  the  chains  we  deal  with  will  be  aperiodic.  If  the  top 
element  in  a  configuration  is  requested,  none  of  these  schemes  will 
alter  the  configuration  and  hence  we  will  have  P  (x,x)  >  0.  Hence  1  is 
in  the  set  {n  >  1  :  Pn(x,x)  >  0}  so  the  g.c.d.  must  be  1  and  the  chain 
must  be  aperiodic. 

Definition:  A  steady  state  distribution  (also  called  a  stationary 
distribution  is  defined  as  any  probability  distribution  x  over  the  states 
such  that  xP  =  x. 

Note  that  we  can  easily  determine  this  distribution  by  solving 
the  system  x(P-I)  =  0.  The  following  theorem  shows  that  this  distribu- 
tion tells  us  the  asymptotic  behavior  of  the  chain. 
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Theorem:  Any  closed  and  irreducible  chain  with  a  finite  number  of 
states  has  a  unique  steady  state  distribution.  If  the  chain  is 
aperiodic,  it  approaches  the  steady  state  distribution  from  any  initial 
distribution.  If  the  chain  has  period  d,  then  for  0  <   i  -  d,  the 

chain  x.  >xi+H'xi+2d  •••  nas  a  un1clue  steady  state  distribution  and 
approaches  it. 

Another  useful  theorem  that  characterizes  asymptotic  behavior 
is  the  ergodic  theorem. 

Theorem  [Ergodic  Theorem]:  Let  N.(s)  be  a  random  variable  denoting  the 
number  of  times  the  chain  has  been  in  state  s  after  t  transitions. 
Suppose  that  s  has  steady  state  probability  p(s).  Then  if  the  chain  is 
closed  and  irreducible  (ergodic) 

Urn  Vs>    ,  ,  • 

Note  the  ergodic  theorem  holds  for  both  periodic  and  aperiodic 
chains. 

The  chains  we  are  dealing  with  have  a  cost  associated  with 

each  state.  Let  the  probability  of  being  in  state  s  at  time  t  be  Pt(s), 

and  suppose  s  has  steady  state  probability  p(s)  and  cost  c(s).  Finally 

define  the  cost  of  the  chain  at  time  t  (C0STt)  as  J  pt(s)c(s). 

seS 

For  an  aperiodic  irreducible  chain,  we  have  .^  Pt(s)  =  p(s). 

Hence  1™  COST.  =  I   p(s)c(s). 
L  u   seS 

For  any  irreducible  chain,  we  can  use  the  Ergodic  Theorem  to  show 
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1  l 

lim  j    I   COST.  =  5!  (s)c(s).  Therefore  £  p(s)c(s)  determines  the 

t-*»   i=l    ]   seS  seS 

asymptotic  cost  of  the  chain.  We  also  use  the  following  theorem. 


Theorem:  Let  c.  be  random  variables  denoting  the  cost  of  the  state  the 
chain  is  in  at  time  i.  Then  for  any  irreducible  chain, 

-,.  C-,+C9+..  .+C. 

i™e(j_^ — £)=  lP(s)c(s) 

L  l  seS 

•,.              c.+c0+..  .+c. 
limVAR(J2  t)=Q 

t-X"  t 

Proof:  To  prove  the  first  statement,  note  E(c.)  =  COST..  Then 

t      n       ^ 

c  +c  +   +c        I   C0STi 

lim  rrcrc2+---+ct,  _  lim  ifl \  -  ,  ,  ,el 

t-o  E( 1 )  "  t-*» 1 Ip(s)c(s). 

seS 


To  prove  the  second  statement, 

1-im     c,+c«+...+c+ 

1™  VAR(-1— ^ h 

t-*»  t 

"Mm    c,+c0+.. ,+c.  2  0 

.  limE[(J_L 1}  j.  (  Zp(s)c(s))2 

seS 

lnm   Ct+C0+. . .+C,   2  0 

=  E[(J™-J-L 1)  ]  -  (  I   p(s)c(s))2  (1) 

1       L  seS 

lim   C1+C«+...+C. 

We  now  determine  '   (— — ^r -).      Let  N.(s)  be  the  number  of  times 

state  s  is  visited  in  t  transitions.  Then  c,+c2+...+c.  equals  the 
number  of  times  we  visit  each  state  times  its  cost  summed  over  all  states 
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Therefore 

c  +c  +       +c                    E  N   (s)c(s) 
lim  (_l_^__t)  _  lim  seS  z 

lim  Ms> 
scS  r  l  seS 

by  the  Ergodic  Theorem.     Substituting  this  into  (1)   shows  the  variance 

is  zero.  I I 

Another  important  question  is  how  quickly  the  chain  approaches 
steady  state.  We  can  tell  this  from  the  eigenvalues  of  the  transition 
matrix.  To  see  this,  suppose  the  n  eigenvectors  y-.,...,y  span  the 
space  of  all  probability  distributions.  We  can  then  write  an  initial 
distribution  xQ  as  a  linear  combination  of  the  y. 

x«  =  c,y,  +  ...  +  c  y 
0    rl        n-'n 

So  after  t  transitions,  the  distribution  will   be 

"*Opt  =  Cl^l   +   ••■   +  cAr 

Since  the  chain  has  a  stationary  distribution,  there  is  some  x  such  that 
xP  =  x.  Hence  x  is  an  eigenvector  with  eigenvalue  1.  If  the  chain  is 
closed,  irreducible  and  aperiodic,  we  can  show  that  all  other  eigenvalues 

have  modulus  strictly  less  than  1.  If  we  suppose  X,  is  the  eigenvalue 

-t-t-    t-         t  - 
equal  to  1 ,  we  get  xQP  =  x  +  X-c^yp  +  ^3C3y3  +  ...  +  ^ncnyn-  Tnen  as 

t  -*-  °°,  xnP  -*•  x,  and  the  rate  of  convergence  is  limited  by  the  size  of 

the  other  n-1  eigenvalues  and  eigenvectors. 
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